Configuring and managing AI-powered spam detection in Discourse

:bookmark: This guide explains how to configure and use Discourse AI’s spam detection feature, including the setup process, scanning criteria, classification logic, customizations, and contrasts with AI triage.

:person_raising_hand: Required user level: Administrator

Discourse AI provides an efficient spam detection feature that identifies and flags spam posts with minimal configuration. While designed for simplicity, it complements the more versatile AI triage system, which supports broader workflows and larger use cases.

Summary

In this guide, you will learn:

  • How AI spam detection works and what content is scanned
  • The classification logic and context used by the AI
  • Steps to configure spam detection through /admin/plugins/discourse-ai/ai-spam
  • Guidelines for Large Language Model (LLM) selection
  • Key differences between spam detection and AI triage
  • How to manage flagged and missed posts

How AI spam detection works

What content gets scanned?

AI spam detection evaluates posts based on these criteria:

  1. User trust level:

    • Scans posts from users with trust level 1 or lower.
    • Excludes posts from higher trust levels.
  2. Post type:

    • Public posts (excluding private messages).
    • Both reply posts and first topic posts are included, based on additional thresholds.
  3. Post edits:

    • Scans posts with significant edits (e.g., changes exceeding 10 characters).
    • Enforces a 10-minute delay between scans of the same post.
  4. Post frequency:

    • Prioritizes posts from new users with fewer than 4 total posts in public topics.
    • Excludes posts from users exceeding this threshold.

The classification process

Posts that meet the criteria are sent to an AI model (LLM) for analysis. The model evaluates whether the post is “SPAM” or “NOT SPAM” based on:

  • Context: Includes post content, topic title, user account data (e.g., account age and trust level), and site guidelines.
  • Custom instructions: Admin-defined rules for reinforced or adapted scanning criteria.
  • Automated detection:
    • Flags irrelevant or promotional content (e.g., ads or commercial materials).
    • Identifies automated or bot-like behaviors.
    • Assesses content relevance to the discussion.

Default prompt and context

The AI uses a default system prompt to guide spam detection. This prompt outlines spam classification rules. For example:

You are a spam detection system. Analyze the following content and context.
Notes:
- Replies must remain relevant to the discussion thread.
- Mark as SPAM if the content is irrelevant, promotional, or automated.
- Consider new user posts with links as potential SPAM unless explicitly relevant to the topic.
Respond only with "SPAM" or "NOT SPAM".

The scanner also compiles a context package, including:

  • Metadata from topics and categories.
  • Relevance of replies to the thread.
  • Author data (e.g., account creation date, total posts, trust level).
  • Post text truncated to 5000 characters for processing.

How to configure AI spam detection

Configuration guide

  1. Access settings:
    Navigate to /admin/plugins/discourse-ai/ai-spam.

  2. Select an LLM:

  3. Activate spam detection:
    Enable spam detection by toggling the feature on.

:information_source: Note: A connected LLM is mandatory.

  1. Add customized instructions:
    • Define rules specific to your forum (e.g., stricter monitoring of external links).
    • Save any changes to apply them.

:information_source: Tip: Disable Akismet when using Discourse AI spam detection to avoid redundancy.


Differences from AI triage

While spam detection is designed specifically for identifying spam, AI triage supports broader post management tasks.

Feature AI Spam Detection AI Triage
Complexity Streamlined, opinionated setup Highly customizable and flexible
Primary use case Detecting spam with minimal overhead Advanced workflows for categorization, tagging, replies, spam detection, nsfw detection
Actions Flags spam, silences users Tags, categorizes, hides posts, adds replies, flags posts, silences users
Recommendation Use instead of Akismet Use for rich highly customizable workflows

For more details, see Discourse AI - AI triage.


LLM selection recommendations

The performance of spam detection depends on the chosen LLM.

Most low-cost LLMs work effectively, such as:

  • GPT-4o-mini
  • Claude 3.5 Haiku
  • Gemini 2.0 Flash

Experiment with different models to find the best fit. Configure your models via /admin/plugins/discourse-ai/ai-llms.


Testing spam scanner behavior

You can test spam detection rules directly from the configuration page.

  • Paste a post URL or ID into the test field.
  • Review the classification result (e.g., “SPAM” or “NOT SPAM”) and analyze logs to understand reasoning.
  • Unsaved changes are applied during testing, enabling experimentation without risk.

Managing flagged and missed posts

Handling flagged posts

Flagged posts appear in the moderation queue. Admins can:

  • Approve legitimate posts wrongly classified as spam.
  • Reject spam topics to keep the system accurate.

:warning: Important: Reject spam flags for incorrectly classified posts. Users remain silenced until the flag is resolved.

Handling missed spam

Missed spam refers to posts bypassing detection but flagged by the community. Moderators can manage these as necessary.


Best practices

  • Monitor flagged and missed spam regularly to refine system accuracy. Clickable metrics simplify this process.
  • Use test cases to evaluate custom instructions against edge cases.
  • Review and adjust LLM settings when needed.

Additional resources


:mega: Configuring AI spam detection effectively reduces manual moderation efforts, ensuring a clean, spam-free community.

Last edited by @MarkDoerr 2024-12-21T02:05:11Z

Check documentPerform check on document:
7 Likes