This is a guide for setting up NSFW content detection in your community using Discourse AI automation to identify and moderate inappropriate images and text.
Required user level: Administrator
Setting up NSFW detection in your community
Automatically detect and moderate NSFW (Not Safe for Work) content in your Discourse community using AI-powered automation. This guide will help you configure automated detection for both inappropriate images and text content, allowing you to maintain community standards with minimal manual intervention.
Summary
This documentation covers configuring the Discourse AI Post Classifier automation to:
- Detect NSFW images using vision-enabled AI models
- Identify inappropriate text content and language
- Automatically flag, categorize, and moderate problematic posts
- Set up custom responses and moderation actions
The automation uses large language models (LLMs) to analyze post content and takes predefined actions when NSFW material is detected.
Prerequisites
Before setting up NSFW detection, ensure you have the following enabled:
- Discourse AI plugin - The core AI functionality plugin
- Discourse Automation plugin - Required for creating automated rules
- Vision-enabled LLM - Required only for image detection; standard LLMs work for text-only detection
- Persona: Persona with a system prompt that defines what constitutes NSFW content. Use distinct language for positive and negative classifications to avoid confusion.
Example prompts:
For image detection:
You are a bot specializing in image classification. Respond only with either NSFW or SAFE, and nothing else. NSFW is porn or gore, and SAFE is everything else. When in doubt reply with SAFE.
For text detection:
You are an advanced AI content moderation system designed to triage user-generated posts. Your task is to detect and flag any content that includes bad language, inappropriate terms, or NSFW (Not Safe for Work) content.
NSFW content includes explicit sexual content, violence, hate speech, graphic language, discrimination, self-harm references, or illegal activity.
Respond with exactly one word:
* "SAFE": The post is appropriate and doesn't contain bad or NSFW content
* "NSFW": If bad, inappropriate, or NSFW content is detected
Be context-aware and avoid false positives.
For hosted customers
Discourse Business and Enterprise plan customers can access hosted CDCK LLMs by enabling experimental settings on your site’s Admin > What’s New page.
For self-hosted sites
Configure a third-party vision-enabled LLM through the Discourse AI - Large Language Model settings page.
Configuration steps
Enable required plugins
- Navigate to your site’s admin panel
- Go to Plugins > Installed Plugins
- Enable both the Discourse AI and Automation plugins
Create automation rule
- In the admin panel, navigate to Plugins > Automation
- Click + Create to begin creating a new automation rule
- Select Triage Posts Using AI
- Set a descriptive name (e.g., “NSFW Content Detection”)
Configure triggers and restrictions
Set the trigger:
- Choose Post created/edited as the trigger
- Optionally specify Action type, Category, Tags, Groups, or Trust Levels to restrict automation scope
- Leave fields blank to apply automation site-wide
Optional restrictions:
Configure additional settings in the What/When section to further limit automation scope, such as targeting only first posts from new users.
Configure AI classification
The system prompt field has been deprecated in favor of Personas. If you had an AI automation prior to this change, a new Persona with the associated system prompt will be automatically created.
Persona:
Select the Persona defined for the NSFW detection automation.
Search text:
Enter the exact output from your prompt that triggers automation actions. Using the examples above, enter NSFW
.
Set moderation actions
Categorization and tagging:
- Define the category where flagged posts should be moved
- Specify tags to be added to identified NSFW content
Flagging options:
- Choose flag type: spam (auto-hide) or review queue (manual review)
- Enable “Hide Topic” to automatically hide flagged content
Automated responses:
- Set a reply user for system responses
- Create a custom message explaining why the post was flagged
- Optionally use AI Persona for dynamic responses
Caveats
- Keep in mind, LLM calls can be expensive. When applying a classifier be careful to monitor costs and always consider only running this on small subsets.
- While better-performing models, i.e - GPT-4o, will yield better results, it can come at a higher cost. However, we have seen the cost decrease over time as LLMs get even better and cheaper
Other uses
The prompt could be customized to perform all sorts of detection, like PII exposure and spam detection. We’d love to hear how you are putting this automation to work to benefit your Community!
Last edited by @pedro 2025-09-19T05:19:32Z
Last checked by @Lilly 2024-12-06T00:16:06Z
Check document
Perform check on document: