Discourse AI - Toxicity

This topic covers the configuration of the Toxicity module of the Discourse AI plugin.

Feature set

The Toxicity modules can automatically classify the toxicity score of every new post and chat message in your Discourse instance. Optionally, you can also enable automatic flagging of content that crosses a threshold.

Classifications are stored in the the database, so you can enable the plugin and use Data Explorer for reports of the classification happening for new content in Discourse immediately. We will soon ship some default Data Explorer queries with the plugin to make this easier.

Settings

  • ai_toxicity_enabled: Enables or disables the module

  • ai_toxicity_inference_service_api_endpoint: URL where the API is running for the toxicity module. If you are using CDCK hosting this is automatically handled for you. If you are self-hosting check the self-hosting guide.

  • ai_toxicity_inference_service_api_key: API key for the toxicity API configured above. If you are using CDCK hosting this is automatically handled for you. If you are self-hosting check the self-hosting guide.

  • ai_toxicity_inference_service_api_model: ai_toxicity_inference_service_api_model: We offer three different models: original, unbiased, and multilingual. unbiased is recommended over original because it’ll try not to carry over biases introduced by the training material into the classification. For multilingual communities, the last model supports Italian, French, Russian, Portuguese, Spanish, and Turkish.

  • ai_toxicity_flag_automatically: Automatically flag posts/chat messages when the classification for a specific category surpasses the configured threshold. Available categories are toxicity, severe_toxicity, obscene, identity_attack, insult, threat, and sexual_explicit. There’s an ai_toxicity_flag_threshold_${cateogry} setting for each one.

  • ai_toxicity_groups_bypass: Users on those groups will not have their posts classified by the toxicity module. By default includes staff users.

9 Likes

Tuning this a bit right now, am I correct in assuming that a higher threshold is more stringent and a lower one more lenient?

1 Like

I would say the higher the threshold, the more lenient it would be. A lower threshold would be more apt to flag a post as being toxic since it would take less to trigger a flag, thus a higher threshold would require more to trigger a flag.
Low threshold = easy to cross
High threshold = harder to cross

2 Likes

I want to have a mechanism to catch attempts at commercial activity on our site - not toxicity per se, but very damaging to our community.

This is close, but not quite looking for the thing we are interested in.

Have you considered this dimension?

That’s covered by Discourse AI Post Classifier - Automation rule. Let me know how it goes.

4 Likes