Setting up NSFW detection in your community

Saif · October 10, 2024, 4:26am

This is a guide for setting up NSFW content detection in your community using Discourse AI automation to identify and moderate inappropriate images and text.

Required user level: Administrator

Setting up NSFW detection in your community

Automatically detect and moderate NSFW (Not Safe for Work) content in your Discourse community using AI-powered automation. This guide will help you configure automated detection for both inappropriate images and text content, allowing you to maintain community standards with minimal manual intervention.

Summary

This documentation covers configuring the Discourse AI Triage posts using AI automation to:

Detect NSFW images using vision-enabled AI models
Identify inappropriate text content and language
Automatically flag, categorize, and moderate problematic posts
Set up custom responses and moderation actions

The automation uses large language models (LLMs) to analyze post content and takes predefined actions when NSFW material is detected.

Prerequisites

Before setting up NSFW detection, ensure you have the following enabled:

Discourse AI plugin - The core AI functionality plugin
Discourse Automation plugin: Required for creating automated rules
Agent: Agent with a system prompt that defines what constitutes NSFW content. Use distinct language for positive and negative classifications to avoid confusion.
Vision-enabled LLM: Required only for image detection; standard LLMs work for text-only detection. Make sure “Vision enabled” is turned on for both the LLM model and the Agent.
- Discourse hosted customers can select our CDCK Hosted Small LLM when configuring Agents.
- Self-hosted Discourse users will need to configure a third-party LLM.

Example prompts:

For image detection:

You are a bot specializing in image classification. Respond only with either NSFW or SAFE, and nothing else. NSFW is porn or gore, and SAFE is everything else. When in doubt reply with SAFE.

For text detection:

You are an advanced AI content moderation system designed to triage user-generated posts. Your task is to detect and flag any content that includes bad language, inappropriate terms, or NSFW (Not Safe for Work) content.

NSFW content includes explicit sexual content, violence, hate speech, graphic language, discrimination, self-harm references, or illegal activity.

Respond with exactly one word:
* "SAFE": The post is appropriate and doesn't contain bad or NSFW content
* "NSFW": If bad, inappropriate, or NSFW content is detected

Be context-aware and avoid false positives.

Configuration steps

Enable required plugins

Navigate to your site’s admin panel
Go to Plugins > Installed Plugins
Enable both the Discourse AI and Automation plugins

Create automation rule

In the admin panel, navigate to Plugins > Automation
Click + Create to begin creating a new automation rule
Select Triage Posts Using AI
Set a descriptive name (e.g., “NSFW Content Detection”)

Configure triggers and restrictions

Set the trigger:

Choose Post created/edited as the trigger for scanning new or edited posts
Alternatively, choose Stalled topic to triage topics that have gone without replies for a specified duration
Optionally specify Action type, Categories, Tags, Groups, Trust Levels, or Post features to restrict automation scope
Leave fields blank to apply automation site-wide

Optional restrictions (Post created/edited trigger):
Configure additional settings to further limit automation scope:

First post only or Original post only to target only new topics
First topic only to target only a user’s first topic
Post features to restrict to posts with images, links, code, or uploads — useful for image-based NSFW detection
Restricted archetype to limit to regular topics, public topics, or personal messages

Configure AI classification

The system prompt field has been deprecated in favor of Agents. If you had an AI automation prior to this change, a new Agent with the associated system prompt will be automatically created.

Agent:
Select the Agent defined for the NSFW detection automation.

Search text:
Enter the exact output from your prompt that triggers automation actions. Using the examples above, enter NSFW.

Advanced options:

Max Post Tokens: Limit how many tokens of the post are sent to the LLM
Max output tokens: Set an upper bound on the number of tokens the model can generate
Stop Sequences: Instruct the model to halt generation when it encounters specific values

Set moderation actions

Categorization and tagging:

Define the category where flagged posts should be moved
Specify tags to be added to identified NSFW content

Flagging options:

Enable Flag post to activate flagging, then choose a flag type:
- Add post to review queue — sends the post to the review queue for manual moderator review
- Add post to review queue and hide post — review queue + immediately hides the post
- Add post to review queue and delete post — review queue + soft-deletes the post
- Add post to review queue, delete post and silence user — review queue + soft-deletes the post + silences the author
- Flag as spam and hide post — flags the post as spam (auto-hides it)
- Flag as spam, hide post and silence user — spam flag + silences the author
Enable Hide Topic to automatically hide the entire topic

Automated responses:

Set a Reply User and Reply (canned reply) to post a fixed message explaining why the post was flagged
Select a Reply Agent to use a separate AI agent for generating dynamic responses (this takes priority over a canned reply)
Enable Reply as Whisper to make the reply visible only to staff

Author notifications:

Enable Notify author via PM to send a personal message to the post author when their content is flagged
Set a PM sender user (defaults to system) and optionally provide a custom PM content

Other options:

Enable Include personal messages to also scan and triage personal messages

Caveats

Keep in mind, LLM calls can be expensive. When applying a classifier be careful to monitor costs and always consider only running this on small subsets.
While better-performing models, i.e - GPT-4o, will yield better results, it can come at a higher cost. However, we have seen the cost decrease over time as LLMs get even better and cheaper

Other uses

The prompt could be customized to perform all sorts of detection, like PII exposure and spam detection. We’d love to hear how you are putting this automation to work to benefit your Community!

sam · March 2, 2026, 4:06am

A post was split to a new topic: LLM and NSFW Content Detection Delay

Topic		Replies	Views
Have AI check for inappropriate post or at least words and flag the post Support ai , ai-toxicity	2	478	July 7, 2023
Setting up toxicity detection in your community Site Management moderation , automation , how-to , ai	0	1134	August 7, 2024
NSFW image blurring in chat Support chat , ai	4	673	August 27, 2024
Discourse AI - Spam detection Site Management moderation , how-to , ai , spam	30	4391	March 10, 2026
AI integration for moderation Support	2	162	January 25, 2026

Setting up NSFW detection in your community

Setting up NSFW detection in your community

Summary

Prerequisites

Configuration steps

Enable required plugins

Create automation rule

Configure triggers and restrictions

Configure AI classification

Set moderation actions

Caveats

Other uses

Related topics