Setting up NSFW detection in your community

:bookmark: This is a how-to guide for setting up NSFW Images and text detection in your community using the Discourse AI Post Classifier.

:person_raising_hand: Required user level: Administrator

Overview

In this topic we are going to use Discourse AI Post Classifier to detect NSFW Images and text in your community. Using this guide, admins will be aware of such posts and take action accordingly.

Note that the instructions here can be customized to your preference.

Below is an example setup of the Post Classifier:

Why should I use this?

  • If you’ve previously tried the NSFW feature of Discourse AI but were not happy with its results for detection in your community
  • You want automation help to sift through all the content posted in the community

Pre-requisites

In order for this to work you will need the following enabled

:information_source: Note that the vision-enabled LLM is only a requirement if you are trying to detect images, else a standard LLM will work fine for text detection.

Configuration

The following would still apply while creating this automation…

Prompt

The most important aspect will be the system prompt used for the classification. In the following example I have used AI Bot to author the prompt.

:warning: When authoring the prompt, picking between NSFW and not NSFW - avoid having similar language for the end result. In this example, we use NSFW and Safe (for not NSFW)

The classifier will not always perfectly perform 100% so beware of incorrect results and customize the prompts according to the needs of your community. The narrower the focus the better.

Edited LLM prompts for NSFW content detection AI

You are a bot specializing in image classification. Respond with either NSFW or SAFE, and then on the next line follow with a step by step explanation justifying your decision. NSFW is porn or gore, and SAFE is everything else. When in doubt reply with SAFE.

Caveats

  • Keep in mind, LLM calls can be expensive. When applying a classifier be careful to monitor costs and always consider only running this on small subsets.
  • While better performing models, i.e - GPT-4o, will yield better results, it can come at a higher cost. However, we have seen the cost decrease over time as LLMs get even better and cheaper
  • The prompt could be customized to do all sorts of detection, like PII exposure, spam detection, etc.

Last edited by @Saif 2024-11-07T19:42:37Z

Check documentPerform check on document:
3 Likes