This feature is now part of Discourse AI. This plugin is deprecated.
Summary | Disorder helps moderators by automatically flagging potential toxic content on your Discourse forum. | |
Repository Link | https://github.com/xfalcox/disorder | |
Install Guide | How to install plugins in Discourse |
If you are using our official hosting service, please contact our support to register your interest for this plugin.
Toxicity
As @hawk said in Dealing with Toxicity in Online Communities, managing toxicity in your community is fundamental.
While Discourse ships out of the box with many different tools to help manage toxicity in your community, we are always researching ways to improve it further. In particular, I’ve been researching about possible applications of Machine Learning and AI in online forums. Which is now an experimental plugin, available to all communities.
The plugin
Disorder leverages Artificial Intelligence and Machine Learning to help you moderate you community, making it easier for your moderation team to be on top of potentially problematic content and even optionally nudging your users to revise toxic posts before posting.
This is a first foray into using self-hosted ML models in Discourse, and while it’s a simple model it sets a pattern that can be reused to apply more complex models down the road.
Features
Background Flagging
This is Disorder’s main mode of operation, as it’s completely transparent to your users, that will not be aware of any changes.
Whenever a new post (or chat message using Discourse Chat) is created, it will be put in a classification queue asynchronously. In case that classification comes back above a configurable threshold, the post/chat message will be flagged so your moderation team is warned about it, and can make the final decision on the flag.
New post intervention
If you think that prevention is the best medicine, you may be interested in this more active option.
You can enable a synchronous classification of any new post, that if above a configurable threshold of toxicity, will trigger an intervention on the new post flow, asking the user to review and make amends to the message that may be outside of the boundaries set by your community rules.
This will only happen once, and after closing the modal the user will be able to post normally.
How does it work?
This plugin integrates the open source models from Detoxify, using a remote API call model to allow admins to properly scale the inference rate to each community needs.
We provide a simple image that provides an thin HTTP API that Discourse will call to perform content classification, which can be ran either in the same server where you run Discourse, or in a different server altogether.
The Discourse plugin listens to new post / new chat messages events, and enqueue a classification job in the background queue. Results are stored in the database so you can extract reports, and we flag content using a separate bot user so we can track it’s flag accuracy over time.
Options
First, the plugin ships working out of the box, so it’s not necessary to change any setttings right away. However, if you want to change the plugin behavior, there are a few knobs you can use.
We provide 3 different classification models that you can pick on the plugin options:
-
unbiased (default): A model that tries to reduce the unintended model bias in toxicity classification
-
multilingual: A model that can classify Italian, French, Russian, Portuguese, Spanish and Turkish.
-
original: Most simple model.
You can also tweak if the plugin will:
- automatically flag
- enable sync intervention on toxic posts with warning (experimental)
- enable sync intervention on toxic posts (not recommended)
All the above only happen when the comment is classified to be above the thresholds for each classification type:
- toxicity
- severe_toxicity
- identity_attack
- insult
- threat
- sexual_explicit
You can tweak each of the classification thresholds for automatic actions.
Classification Service
The plugin comes pre-configured working out of the box. For that, it’s contacting a service ran by Discourse (CDCK) to classify the user content. That classifier API service is open-source, and you can run your own copy of the service if necessary.