Discourse AI - Spam detection

sam · December 20, 2024, 4:46am

This guide explains how to configure and use Discourse AI’s spam detection feature, including the setup process, scanning criteria, classification logic, customizations, and contrasts with AI triage.

Required user level: Administrator

This is now default turned on for Starter and Pro customers, as well as our legacy Basic, Open Source, Creator, and Business customers.

Discourse AI provides an efficient spam detection feature that identifies and flags spam posts with minimal configuration. While designed for simplicity, it complements the more versatile AI triage system, which supports broader workflows and larger use cases.

Summary

In this guide, you will learn:

How AI spam detection works and what content is scanned
The classification logic and context used by the AI
Steps to configure spam detection through /admin/plugins/discourse-ai/ai-spam
Guidelines for Large Language Model (LLM) selection
Key differences between spam detection and AI triage
How to manage flagged and missed posts

How AI spam detection works

What content gets scanned?

AI spam detection evaluates posts based on these criteria:

User trust level:
- Scans posts from users with trust level 1 or lower.
- Excludes posts from higher trust levels.
Post type:
- Public posts (excluding private messages).
- Both reply posts and first topic posts are included, based on additional thresholds.
Post edits:
- Scans posts with significant edits (e.g., changes exceeding 10 characters).
- Enforces a 10-minute delay between scans of the same post.
Post frequency:
- Prioritizes posts from new users with fewer than 4 total posts in public topics.
- Excludes posts from users exceeding this threshold.

The classification process

Posts that meet the criteria are sent to an AI model (LLM) for analysis. The model evaluates whether the post is “SPAM” or “NOT SPAM” based on:

Context: Includes post content, topic title, user account data (e.g., account age and trust level), and site guidelines.
Custom instructions: Admin-defined rules for reinforced or adapted scanning criteria.
Automated detection:
- Flags irrelevant or promotional content (e.g., ads or commercial materials).
- Identifies automated or bot-like behaviors.
- Assesses content relevance to the discussion.

Default prompt and context

The AI uses a default system prompt to guide spam detection. This prompt outlines spam classification rules. For example:

You are a spam detection system. Analyze the following content and context.
Notes:
- Replies must remain relevant to the discussion thread.
- Mark as SPAM if the content is irrelevant, promotional, or automated.
- Consider new user posts with links as potential SPAM unless explicitly relevant to the topic.
Respond only with "SPAM" or "NOT SPAM".

The scanner also compiles a context package, including:

Metadata from topics and categories.
Relevance of replies to the thread.
Author data (e.g., account creation date, total posts, trust level).
Post text truncated to 5000 characters for processing.

Configure AI spam detection

If your site is hosted by Discourse

You enable this feature’s experimental version with Discourse-hosted LLMs by visiting forum.example.com/admin/whats-new. Then, search for the “Setup and detect spam in one click!” section and toggle the setting on.

The image shows a user interface for setting up and detecting spam in a single click, with an experimental AI-based spam detector that can be enabled for free. (Captioned by AI)517×318 71.9 KB

Configuration guide

Access settings:
Navigate to /admin/plugins/discourse-ai/ai-spam.
Select an LLM:
- Choose a language model suited to your forum’s needs. See the Large Language Model (LLM) settings page for configuring LLMs.
- Access /admin/plugins/discourse-ai/ai-llms for LLM configurations.
Activate spam detection:
Enable spam detection by toggling the feature on.

Note: A connected LLM is mandatory.

Add customized instructions:
- Define rules specific to your forum (e.g., stricter monitoring of external links).
- Save any changes to apply them.

Differences from AI triage

While spam detection is designed specifically for identifying spam, AI triage supports broader post management tasks.

Feature	AI Spam Detection	AI Triage
Complexity	Streamlined, opinionated setup	Highly customizable and flexible
Primary use case	Detecting spam with minimal overhead	Advanced workflows for categorization, tagging, replies, spam detection, nsfw detection
Actions	Flags spam, silences users	Tags, categorizes, hides posts, adds replies, flags posts, silences users
Recommendation	Easy setup and effective for most situations	Use for rich highly customizable workflows

For more details, see Discourse AI - AI triage.

LLM selection recommendations

The performance of spam detection depends on the chosen LLM.

Most low-cost LLMs work effectively, such as:

GPT-4o-mini
Claude 3.5 Haiku
Gemini 2.0 Flash

Experiment with different models to find the best fit. Configure your models via /admin/plugins/discourse-ai/ai-llms.

Testing spam scanner behavior

You can test spam detection rules directly from the configuration page.

Paste a post URL or ID into the test field.
Review the classification result (e.g., “SPAM” or “NOT SPAM”) and analyze logs to understand reasoning.
Unsaved changes are applied during testing, enabling experimentation without risk.

Managing flagged and missed posts

Handling flagged posts

Flagged posts appear in the moderation queue. Admins can:

Approve legitimate posts wrongly classified as spam.
Reject spam topics to keep the system accurate.

Important: Reject spam flags for incorrectly classified posts. Users remain silenced until the flag is resolved.

Handling missed spam

Missed spam refers to posts bypassing detection but flagged by the community. Moderators can manage these as necessary.

Best practices

Monitor flagged and missed spam regularly to refine system accuracy. Clickable metrics simplify this process.
Use test cases to evaluate custom instructions against edge cases.
Review and adjust LLM settings when needed.

Additional resources

Configuring AI spam detection effectively reduces manual moderation efforts, ensuring a clean, spam-free community.

Last edited by @Southpaw 2025-06-19T01:50:24Z

Check document
Perform check on document:

jordan-violet · February 14, 2025, 6:59pm

We’ve done quite a bit of testing with this, and we don’t seem to get reliable results at all. For context, we’re using the gpt-4o model.

To test its accuracy, I gave the following simple instructions:

You are a spam detection system. Analyze the following content and context.
Notes below. If *ANY* of the items are true below then mark it as spam:
- The username is very specifically "testjon", then it is *ALWAYS* spam.
- Respond only with "SPAM - It's Jon!" or "NOT SPAM".

Testing on a post, by the username testjon, results in NOT SPAM. It seems like it’s not heeding instructions well at all. Any suggestions?

Have any others had any good or bad experiences with the AI spam detection?

Jagster · February 14, 2025, 7:53pm

I don’t know how things are in this situation, but in general statement as quoted is very prone to break down. It doesn’t understand what ANY means and goes happily as long it gets. And from there it found at last NOT SPAM.

jordan-violet · February 14, 2025, 8:46pm

So you’re saying to remove the bolding for ANY? Or you’re saying the statement overall of “if any items below”?

Jagster · February 14, 2025, 8:54pm

I’m saying you have to write it more logical and exact. You can’t let an AI choose in any way. Remeber it can’t count and it defenetly not read first all and them come back and try to work logically. Try to explain so simple as you would give instructions to lazy ADHD 3 years old. Examples aren’t wrong but will increase use of tokens.

jordan-violet · February 14, 2025, 9:05pm

This is awesome info. For example, how might you write this exact scenario differently?

Jagster · February 15, 2025, 8:05am

Something like…

You are a spam detection system. Your job is analyse silently content to keep up high quality in this forum. You must follow rules to define when a post is spam. When you find a spam, your respond is told in rules. You are using only told responses.

## Rules for spam

I don’t do this for you 😏 But you need some explanations and examples. Like as a fast&crude example:
* if a post has links to outside that are connected to gambling, sex, crypto etc. similar (similar is risky in this context, BTW), then a post is classified as spam. Example: www.buy-crypto.deal

This you must tune up case by cases, because you will get false positives and false negatives

Then you must give some guidelines to content too. But when testing:

* if username is ”testjon” skip analysing content and classify that directly spam. Your response is ”SPAM - it’s Jon”

BTW, can it see user?

## Rules for other content

When a post is passed spam analysing and you are sure it is legit content, your only response is ”NOT SPAM”.

Something like that. You have to test, of course. And everytime you get false response, try to find the confusing point. But don’t give to AI opportunity to choose what it can do, because it will then take the last, easiest or nicest direction. It has coded need to answer and be happy.

cultiv · March 5, 2025, 10:28am

I have just enabled this and am excited to see how it goes!

Is there a setting or a consideration for the trust level users have?

For example: I don’t need the AI to kick in for TL2 and above, they have earned their place and should not be considered for scanning. If they do go rogue, we’ll have to have some words with them

Falco · March 17, 2025, 4:10pm

2 posts were split to a new topic: Discourse AI plugin missing

Olle11 · April 18, 2025, 7:01am

Since this is replacing Akismet, I wonder what the best alternative is for spam detection/prevention if you don’t want the LLM costs that comes with AI?

KhoiUSA · April 18, 2025, 12:37pm

Actually, Gemini 2.0 Flash is available for free, just as long as you aren’t of course sending a million requests to it each day. It’s working fine for my forum right now with zero cost, and it is defenitely more precise and “smarter” than Akismet.

However, if the AI Spam detection plan fails, I still have the Akismet plugin installed on my site and ready to go if I ever need it again, and I think you can still install it. (Since it is being deprecated though, I don’t expect it to stick around forever). Also remember that trust levels are a fundamental core of Discourse that help you manage spam on your site.

Olle11 · April 20, 2025, 5:07am

Oh that is cool, is it possible to put like a limit on tokens to make sure that the limit (zero cost) is there from the LLM?

KhoiUSA · April 20, 2025, 1:51pm

As far as I know, I think if you exceed the limit the API for the LLM will just stop responding. My Google Cloud Console account does not have a billing account attached, and I can still use the API free of charge on the free tier, so you should be good.

Falco · August 1, 2025, 4:47pm

A post was split to a new topic: Improving AI spam detection for edits and merges

Topic		Replies	Views
Setting up spam detection in your community Site Management moderation , automation , how-to , ai	11	1564	January 30, 2025
AI powered Spam detection Announcements ai , spam	11	804	January 11, 2025
Discourse AI to make spam filter smarter? Feature completed , ai	2	396	May 22, 2024
Are you experiencing AI based spam? Community ai	23	1721	January 19, 2025
Have AI check for inappropriate post or at least words and flag the post Support ai , ai-toxicity	3	377	July 7, 2023