Discourse AI - Spam detection

:bookmark: This guide explains how to configure and use Discourse AI’s spam detection feature, including the setup process, scanning criteria, classification logic, customizations, and contrasts with AI triage.

:person_raising_hand: Required user level: Administrator

Discourse AI provides an efficient spam detection feature that identifies and flags spam posts with minimal configuration. While designed for simplicity, it complements the more versatile AI triage system, which supports broader workflows and larger use cases.

Summary

In this guide, you will learn:

  • How AI spam detection works and what content is scanned
  • The classification logic and context used by the AI
  • Steps to configure spam detection through /admin/plugins/discourse-ai/ai-spam
  • Guidelines for Large Language Model (LLM) selection
  • Key differences between spam detection and AI triage
  • How to manage flagged and missed posts

How AI spam detection works

What content gets scanned?

AI spam detection evaluates posts based on these criteria:

  1. User trust level:

    • Scans posts from users with trust level 1 or lower.
    • Excludes posts from higher trust levels.
  2. Post type:

    • Public posts (excluding private messages).
    • Both reply posts and first topic posts are included, based on additional thresholds.
  3. Post edits:

    • Scans posts with significant edits (e.g., changes exceeding 10 characters).
    • Enforces a 10-minute delay between scans of the same post.
  4. Post frequency:

    • Prioritizes posts from new users with fewer than 4 total posts in public topics.
    • Excludes posts from users exceeding this threshold.

The classification process

Posts that meet the criteria are sent to an AI model (LLM) for analysis. The model evaluates whether the post is “SPAM” or “NOT SPAM” based on:

  • Context: Includes post content, topic title, user account data (e.g., account age and trust level), and site guidelines.
  • Custom instructions: Admin-defined rules for reinforced or adapted scanning criteria.
  • Automated detection:
    • Flags irrelevant or promotional content (e.g., ads or commercial materials).
    • Identifies automated or bot-like behaviors.
    • Assesses content relevance to the discussion.

Default prompt and context

The AI uses a default system prompt to guide spam detection. This prompt outlines spam classification rules. For example:

You are a spam detection system. Analyze the following content and context.
Notes:
- Replies must remain relevant to the discussion thread.
- Mark as SPAM if the content is irrelevant, promotional, or automated.
- Consider new user posts with links as potential SPAM unless explicitly relevant to the topic.
Respond only with "SPAM" or "NOT SPAM".

The scanner also compiles a context package, including:

  • Metadata from topics and categories.
  • Relevance of replies to the thread.
  • Author data (e.g., account creation date, total posts, trust level).
  • Post text truncated to 5000 characters for processing.

Configure AI spam detection

:warning: If your site is hosted by Discourse: You can enable this feature’s experimental version with Discourse-hosted LLMs by visiting forum.example.com/admin/whats-new. Then, search for the “Setup and detect spam in one click!” section and toggle the setting on.

Configuration guide

  1. Access settings:
    Navigate to /admin/plugins/discourse-ai/ai-spam.

  2. Select an LLM:

  3. Activate spam detection:
    Enable spam detection by toggling the feature on.

:information_source: Note: A connected LLM is mandatory.

  1. Add customized instructions:
    • Define rules specific to your forum (e.g., stricter monitoring of external links).
    • Save any changes to apply them.

:information_source: Tip: Disable Akismet when using Discourse AI spam detection to avoid redundancy.


Differences from AI triage

While spam detection is designed specifically for identifying spam, AI triage supports broader post management tasks.

Feature AI Spam Detection AI Triage
Complexity Streamlined, opinionated setup Highly customizable and flexible
Primary use case Detecting spam with minimal overhead Advanced workflows for categorization, tagging, replies, spam detection, nsfw detection
Actions Flags spam, silences users Tags, categorizes, hides posts, adds replies, flags posts, silences users
Recommendation Use instead of Akismet Use for rich highly customizable workflows

For more details, see Discourse AI - AI triage.


LLM selection recommendations

The performance of spam detection depends on the chosen LLM.

Most low-cost LLMs work effectively, such as:

  • GPT-4o-mini
  • Claude 3.5 Haiku
  • Gemini 2.0 Flash

Experiment with different models to find the best fit. Configure your models via /admin/plugins/discourse-ai/ai-llms.


Testing spam scanner behavior

You can test spam detection rules directly from the configuration page.

  • Paste a post URL or ID into the test field.
  • Review the classification result (e.g., “SPAM” or “NOT SPAM”) and analyze logs to understand reasoning.
  • Unsaved changes are applied during testing, enabling experimentation without risk.

Managing flagged and missed posts

Handling flagged posts

Flagged posts appear in the moderation queue. Admins can:

  • Approve legitimate posts wrongly classified as spam.
  • Reject spam topics to keep the system accurate.

:warning: Important: Reject spam flags for incorrectly classified posts. Users remain silenced until the flag is resolved.

Handling missed spam

Missed spam refers to posts bypassing detection but flagged by the community. Moderators can manage these as necessary.


Best practices

  • Monitor flagged and missed spam regularly to refine system accuracy. Clickable metrics simplify this process.
  • Use test cases to evaluate custom instructions against edge cases.
  • Review and adjust LLM settings when needed.

Additional resources


:mega: Configuring AI spam detection effectively reduces manual moderation efforts, ensuring a clean, spam-free community.

Last edited by @yigit 2025-01-13T14:06:01Z

Check documentPerform check on document:
14 Likes

We’ve done quite a bit of testing with this, and we don’t seem to get reliable results at all. For context, we’re using the gpt-4o model.

To test its accuracy, I gave the following simple instructions:

You are a spam detection system. Analyze the following content and context.
Notes below. If *ANY* of the items are true below then mark it as spam:
- The username is very specifically "testjon", then it is *ALWAYS* spam.
- Respond only with "SPAM - It's Jon!" or "NOT SPAM".

Testing on a post, by the username testjon, results in NOT SPAM. It seems like it’s not heeding instructions well at all. Any suggestions?

Have any others had any good or bad experiences with the AI spam detection?

I don’t know how things are in this situation, but in general statement as quoted is very prone to break down. It doesn’t understand what ANY means and goes happily as long it gets. And from there it found at last NOT SPAM.

1 Like

So you’re saying to remove the bolding for ANY? Or you’re saying the statement overall of “if any items below”?

I’m saying you have to write it more logical and exact. You can’t let an AI choose in any way. Remeber it can’t count and it defenetly not read first all and them come back and try to work logically. Try to explain so simple as you would give instructions to lazy ADHD 3 years old. Examples aren’t wrong but will increase use of tokens.

1 Like

This is awesome info. For example, how might you write this exact scenario differently?

1 Like

Something like…

You are a spam detection system. Your job is analyse silently content to keep up high quality in this forum. You must follow rules to define when a post is spam. When you find a spam, your respond is told in rules. You are using only told responses.

## Rules for spam

I don’t do this for you 😏 But you need some explanations and examples. Like as a fast&crude example:
* if a post has links to outside that are connected to gambling, sex, crypto etc. similar (similar is risky in this context, BTW), then a post is classified as spam. Example: www.buy-crypto.deal

This you must tune up case by cases, because you will get false positives and false negatives

Then you must give some guidelines to content too. But when testing:

* if username is ”testjon” skip analysing content and classify that directly spam. Your response is ”SPAM - it’s Jon”

BTW, can it see user?

## Rules for other content

When a post is passed spam analysing and you are sure it is legit content, your only response is ”NOT SPAM”. 

Something like that. You have to test, of course. And everytime you get false response, try to find the confusing point. But don’t give to AI opportunity to choose what it can do, because it will then take the last, easiest or nicest direction. It has coded need to answer and be happy.