בוט ספאם של AI טוען שהוא לא ספאם, אך יומן הסריקה מצביע על כך שהוא כן ספאם

I’ve enabled the Discourse AI spam handling on our forum. I’ve set up Claude Sonnet 4 with an API key and selected the Spam detector persona.

I did a test post that is clearly spam. Nothing subtle about it.

It was not blocked and was posted immediately.

When I gave the post URL to the spam bot using the test feature, the result says Not spam, but in the Scan log it says: SPAM - This is a clear promotional advertisement…

My expectation would be that the result would be SPAM, matching the Scan log declaration of SPAM. And that this would then queue up the post for review by admins and moderators, for example.

Might anyone be able to share what I’m missing? I’m no expert – so am open to any guidance!

Thank you!

What is the trust level of the user who posted? AI Spam will skip posts from TL2+ users.

Regarding the test, there’s a bug in the test code and it should say “spam.” I’ll work on a fix.

4 לייקים

Thank you for your reply!

The user I used to post is Trust Level new user

Any thoughts on why the post made it through?

I appreciate your help!

This will fix both the test and the post not getting flagged:

The Spam detector Persona system prompt was confusing Claude models. The change makes the expected response format instructions more explicit.

3 לייקים

Ah, fantastic! The test feature is working as expected.

I am wondering if you might be able to help with why the AI Spam feature is still not blocking a spammy post from being immediately posted? I sent the post to the AI Spam test and it is flagging it as spam - but it was posted.

Am I missing a connecting piece perhaps? Thank you so much for your help with this!

לייק 1

Are you an admin, or higher TL? If yes, then you perhaps would try to use some low TL test user instead.

לייק 1

We skip a post when:

  • The author’s trust level is greater than TL1.
  • The post belongs to a private message topic.
  • The author is a bot.
  • The author is staff (moderator/admin).
  • The author has already made more than 3 posts in regular (non-PM) topics.
  • The post has already been scanned 3 or more times.

If the test is working, I’m confident it has to be because one of the above.

לייק 1

Ahhh yes! Thank you for your patient and helpful replies!

I posted with my admin user instead of my trust level 0 user. :woman_facepalming:

It’s working! I love the way the discourse_ai_spam user shows up as the user who flagged and unlisted the post.

Thank you again for your quick and generous help with this!

לייק 1