Custom AI Content Moderation

Hi everyone!

I’m a machine learning PhD and I’ve been thinking about building a tool to let anyone train a custom AI to help with content moderation on the discourse forums. Is this something that people here would want?

I know that there is the Google perspective API but its very generic. My idea is that you’d have a plugin where you can moderate posts and it automatically trains a text classifier to learn from your moderation. For example, if you mark a post as having toxic language or being off-topic it learns from you and can flag similar posts or even pre-moderate them.

What do you think?


This or auto-categorization of uncategorized topics based on a custom trained model would be nice, too.


Text classifiers (used in forums, for example) have been around for a long time.

Typically this type of text classification and scoring is performed with Bayesian classifiers.

If you do a Google search with keywords:

bayesian classifier ruby

and /or

bayesian classifier javascript

You will find myriad libraries and examples of text classification using Bayesian classifiers.

We have used various Bayesian classifiers for forum post moderation, spam detection, and more over the years; and have implemented custom code to train the classifier when moderators perform mod actions.

Hope this helps.

1 Like

One potential issue here is that many moderation tasks aren’t a simple punishment, but are more complex such as “close the topic for 12 hours” or “this needs to be a wiki post” or “this needs to be moved to a different category”.

Good luck!