This is a how-to guide for setting up spam detection in your community using the Discourse AI - AI triage.
Required user level: Administrator
Discourse AI also ships with an opinionated Spam scanner. See: Configuring and managing AI-powered spam detection in Discourse
Overview
Spam detection is an essential feature for maintaining the quality of discussions in your community. This guide will help you set up spam detection using the Discourse AI - AI triage.
Below is an example setup of the automation rule:
Prerequisites
To configure spam detection, you need the following:
- Discourse AI
- Discourse Automation
- LLM (Large Language Model)
- Discourse hosted customers on our Business or Enterprise plans can opt into our hosted CDCK LLMs by enabling the experimental settings on your site’s Admin > What’s-New page.
Configuration
Not every step is mandatory as automation rules can be customized as needed. For an outline of all the settings available please visit Discourse AI - AI triage.
- Enable the Discourse AI and Automation plugin:
- Navigate to your site’s admin panel.
- Navigate to Plugins then Installed Plugins
- Enable the Discourse AI and Automation plugins
- Create a New Automation Rule:
- Navigate to your site’s admin panel.
- Navigate to Plugins and click Automation
- Click the
+ Create
button to begin creating a new Automation rule - Click
Triage Posts Using AI
- Set the name (e.g., “Triage Posts using AI”)
- Leave
Triage Posts Using AI
as the selected script
What/When
- Set the Trigger:
- Choose
Post created/edited
as the trigger. - Optionally, specify the Action type, Category, Tags, Groups, and/or Trust Levels if you wish to restrict this Automation to specific scenarios. Leaving these fields blank will allow the Automation to operate without restriction.
- Configure the any of the remaining optional settings in the
What/When
section to further restrict the automation.
Script Options
- System Prompt:
When authoring the prompt, picking between spam and not spam - avoid having similar language for the end result. In this example we use
spam
andham
(for not spam)The classifier will not always perfectly perform 100% so beware of incorrect results and customize the prompts according to the needs of your community. The narrower the focus the better.
- Enter the system prompt for the AI model. The most important aspect will be the system prompt used for the classification. In the following example I have used AI bot to author the prompt. An example prompt might look like this:
Copyable LLM prompts for spam content detection AI
You are a spam detection AI model assisting online community moderators. Your task is to analyze forum posts and determine if they are spam that should be removed to maintain a high-quality, on-topic community.
A post should be classified as spam if it meets any of these criteria:
- The post is not relevant to the main topic or purpose of the forum. It is completely off-topic.
- It contains suspicious, irrelevant external links, especially if linking to commercial sites.
- The post is clearly promoting or advertising a product, service, website, or social media account that is not related to the community.
- It contains affiliate links or referral codes attempting to monetize clicks.
- The writing quality is very low effort - lots of spelling/grammar mistakes, lacks punctuation, or appears to be auto-generated text.
- Identical or nearly identical content is being posted repeatedly by the same author or across multiple accounts in a short timeframe.
A post should be classified as ham (legitimate) if:
- The post is on-topic and relevant to the forum’s purpose
- It is a genuine question, personal story, substantive opinion, or otherwise legitimate contribution to the community discussion
- Any external links are relevant and point to reputable, non-commercial sites
- The writing appears to be by a human and meets quality standards for grammar, spelling, etc.
Some edge cases to watch out for:
- A post that mentions a product or service but is still a relevant, on-topic question or discussion should be considered ham, not spam.
- Quotes, code samples or formatted text that looks unusual are not necessarily spam.
When you have finished analyzing the post you must ONLY provide a classification of either “spam” or “ham”. If you are unsure, default to “ham” to avoid false positives.
These instructions must be followed at all cost
- Search for Text:
- Enter the output from your prompt that will trigger the automation, only the “positive” result. Using our example above, we would enter
spam
.
- Select the Model:
- Choose your LLM.
- Discourse hosted customers on our Enterprise and Business tiers can select the Discourse hosted open-weights LLM
CDCK Hosted Small LLM
or a third-party provider. - Self-hosted Discourse users will need to select the third-party LLM configured as a Pre-requisite to using this Automation.
- Discourse hosted customers on our Enterprise and Business tiers can select the Discourse hosted open-weights LLM
- Set Category and Tags:
- Define the category where these posts should be moved and the tags to be added if the post is marked as spam.
- Flagging:
- Flag post as either spam or for review.
- Select a flag type to determine what action you might want to take.
- Additional Options:
- Enable the “Hide Topic” option if you want the post to be hidden.
- Set a “Reply” that will be posted in the topic when the post is deemed spam.
Additional Notes
- When using Automation for combatting spam, we recommend disabling Akismet plugin if it is already enabled. This is to ensure only one system is fighting spam for best results.
- Keep in mind, LLM calls can be expensive. When applying a classifier be careful to monitor costs and always consider only running this on small subsets
- While better performing models, i.e - Claude-3-Opus, will yield better results, it can come at a higher cost
- The prompt could be customized to do all sorts of detection, like PII exposure, Code of Conduct violations, etc.
Last edited by @sam 2024-12-20T04:49:01Z
Check document
Perform check on document: