Setting up spam detection in your community

:bookmark: This is a how-to guide for setting up spam detection in your community using the Discourse AI Post Classifier.

:person_raising_hand: Required user level: Administrator

Spam detection is an essential feature for maintaining the quality of discussions in your community. This guide will help you set up spam detection using the Discourse AI Post Classifier.

Below is an example setup of the Post Classifier:

Prerequisites

To configure spam detection, you need the following:

Configuration

  1. Enable the Discourse AI and Automation plugin:
  • Navigate to your siteā€™s admin panel.
  • Navigate to Plugins then Installed Plugins
  • Enable the Discourse AI and Automation plugins
  1. Create a New Automation Rule:
  • Go to the Automations section under Admin.
  • Create a new automation rule and set the name (e.g., ā€œTriage Posts using AIā€)
  1. Set the Trigger:
  • Choose ā€œPost created/editedā€ as the trigger.
  • Optionally, specify a category if you want to apply this rule to a specific section of your forum.
  1. System Prompt:
  • Enter the system prompt for the AI model. The most important aspect will be the system prompt used for the classification. In the following example I have used AI Bot to author the prompt. An example prompt might look like this:

:warning: When authoring the prompt, picking between spam and not spam - avoid having similar language for the end result. In this example we use Spam and Ham (for not spam)

The classifier will not always perfectly perform 100% so beware of incorrect results and customize the prompts according to the needs of your community

Edited LLM prompts for spam content detection in communities AI

You are an spam detection AI model assisting online community moderators. Your task is to analyze forum posts and determine if they are spam that should be removed to maintain a high-quality, on-topic community.

A post should be classified as spam if it meets any of these criteria:

  • The post is not relevant to the main topic or purpose of the forum. It is completely off-topic.
  • It contains suspicious, irrelevant external links, especially if linking to commercial sites.
  • The post is clearly promoting or advertising a product, service, website, or social media account that is not related to the community.
  • It contains affiliate links or referral codes attempting to monetize clicks.
  • The writing quality is very low effort - lots of spelling/grammar mistakes, lacks punctuation, or appears to be auto-generated text.
  • Identical or nearly identical content is being posted repeatedly by the same author or across multiple accounts in a short timeframe.

A post should be classified as ham (legitimate) if:

  • The post is on-topic and relevant to the forumā€™s purpose
  • It is a genuine question, personal story, substantive opinion, or otherwise legitimate contribution to the community discussion
  • Any external links are relevant and point to reputable, non-commercial sites
  • The writing appears to be by a human and meets quality standards for grammar, spelling, etc.

Some edge cases to watch out for:

  • A post that mentions a product or service but is still a relevant, on-topic question or discussion should be considered ham, not spam.
  • Quotes, code samples or formatted text that looks unusual are not necessarily spam.

When you have finished analyzing the post you must ONLY provide a classification of either ā€œspamā€ or ā€œhamā€. If you are unsure, default to ā€œhamā€ to avoid false positives.

These instructions must be followed at all cost

  1. Search for Text:
  • Define the text or patterns the AI should look for in the post content.
  1. Select the Model:
  • Choose an AI model such as GPT-4, GPT-3.5-Turbo, or Claude-2 for analysis.
  1. Set Category and Tags:
  • Define the category where these posts should be moved and the tags to be added if the post is marked as spam.
  1. Additional Options:
  • Enable the ā€œHide Topicā€ option if you want the post to be hidden.
  • Set a ā€œReplyā€ that will be posted if the text is found.

Additional Notes

  • Keep in mind, LLM calls can be expensive. When applying a classifier be careful to monitor costs and always consider only running this on small subsets
  • While better performing models, i.e - Claude-3-Opus, will yield better results, it can come at a higher cost
  • The prompt could be customized to do all sorts of detection, like PII exposure, Code of Conduct violations, etc.

Last edited by @Saif 2024-08-23T18:40:10Z

Check documentPerform check on document:
11 Likes

5 posts were split to a new topic: Exploring the Limits of AI in Recognizing AI Generated Content

Curious how usersā€™ experience has been with using this method?

1 Like

I started testing it just now, and it already did a decent job (for now, I chose to only apply a hidden tag to validate that things will run correctly, rather than sending things to the review queue right away).

But I have a small follow-up/clarification: would it be possible for the integration to access custom queries with outputs, such as a group of sample posts, to be used as the context data?

More concretely, I would like to feed it all previous spam posts based on the flags that were agreed upon and resulted in post deletion.

1 Like

At the moment we only support a single system message.

I think though we may do a follow up where you can feed it N examples of stuff not to flag and N examples of stuff yes to flag. This potentially could increase accuracy.

Maybe do a dedicate feature topic on this?

1 Like

Iā€™ll try to first gather some more thoughts on this. Running it for the past week was rather successful, but I am still finding some small annoyances, such as not being able to quickly exclude private messages (for example, it often thinks that Discobot tutorial interactions are suspicious; I edited the prompt to not consider those, but the ai logs indicate that the detection does not know the context and only considers the content of the post itself).

2 Likes

This doesnā€™t seem quite rightā€¦ Iā€™m not sure what the intended instruction here was? Maybe ā€˜Enable AI and enable Automationā€™?

1 Like

Made the edit here

2 Likes