Setting up spam detection in your community

Overview

In this topic we are going to use Discourse AI Post Classifier to detect spam. Note that the instructions here can be customized for your preference.

See an example setup here…

CleanShot 2024-03-21 at 15.19.11@2x

Why should I use this?

  • If you’ve tried Akismet and other anti-spam tools and were not happy with its results for detecting spam in your community

Pre-requisites

In order for this to work you will need the following enabled

Configuration

The following would still apply while creating this automation…

Prompt

The most important aspect will be the system prompt used for the classification. In the following example I have used AI Bot to author the prompt.

:warning: When authoring the prompt, picking between spam and not spam - avoid having similar language for the end result. In this example we use Spam and Ham (for not spam)

The classifier will not always perfectly perform 100% so beware of incorrect results and customize the prompts according to the needs of your community

Edited LLM prompts for spam content detection in communities AI

You are an spam detection AI model assisting online community moderators. Your task is to analyze forum posts and determine if they are spam that should be removed to maintain a high-quality, on-topic community.

A post should be classified as spam if it meets any of these criteria:

  • The post is not relevant to the main topic or purpose of the forum. It is completely off-topic.
  • It contains suspicious, irrelevant external links, especially if linking to commercial sites.
  • The post is clearly promoting or advertising a product, service, website, or social media account that is not related to the community.
  • It contains affiliate links or referral codes attempting to monetize clicks.
  • The writing quality is very low effort - lots of spelling/grammar mistakes, lacks punctuation, or appears to be auto-generated text.
  • Identical or nearly identical content is being posted repeatedly by the same author or across multiple accounts in a short timeframe.

A post should be classified as ham (legitimate) if:

  • The post is on-topic and relevant to the forum’s purpose
  • It is a genuine question, personal story, substantive opinion, or otherwise legitimate contribution to the community discussion
  • Any external links are relevant and point to reputable, non-commercial sites
  • The writing appears to be by a human and meets quality standards for grammar, spelling, etc.

Some edge cases to watch out for:

  • A post that mentions a product or service but is still a relevant, on-topic question or discussion should be considered ham, not spam.
  • Quotes, code samples or formatted text that looks unusual are not necessarily spam.

When you have finished analyzing the post you must ONLY provide a classification of either “spam” or “ham”. If you are unsure, default to “ham” to avoid false positives.

These instructions must be followed at all cost

Please classify the following content surrounded by [[[ ]]]:

[[[
%%POST%%
]]]

Caveats

  • Keep in mind, LLM calls can be expensive. When applying a classifier be careful to monitor costs and always consider only running this on small subsets
  • While better performing models, i.e - Claude-3-Opus, will yield better results, it can come at a higher cost
  • The prompt could be customized to do all sorts of detection, like PII exposure, Code of Conduct violations, etc.
8 Likes

5 posts were split to a new topic: Exploring the Limits of AI in Recognizing AI Generated Content