Pre-emptively warning a contributor about the toxicity of their post

I was perusing through the Discourse feature and moderation lists while thinking about various enhancements that could be made to the platform.

I stumbled across this post that was focused on how implementing various moderation features to the League of Legends chat dropped the number of offenders down 40-50% and generally improved the chat space. This got me thinking - could you use machine learning to proactively moderate a post?

As a user is typing, their text could be analyzed for toxicity, and a warning might let them know that what they are typing may be flagged after they post, perhaps helping them to re-word their post.

Has a feature like this been proposed already? Is machine learning being used anywhere else on Discourse?


I know @angus has been playing around in this exact area (a machine learning plugin), but nothing is released quite yet.


It is unlikely to work any better than a simple word blacklist. There are a billion ways to be rude, many of which involve completely innocuous language.

See also

Read the comments and examples…


Yes, I am working on a machine learning plugin that will provide the infrastructure if you want to run a model to detect this kind of thing. If you want to take a look, the unfinished plugin is here:

But as Jeff points out, I’m not sure there are any decent machine learning models for this use case that would work better than a simple word blacklist.

That said, here’s a model that’s in the ballpark:


I’m going to request access to Google’s perspectiveapi and see what I can get out of it.

I think there is huge potential in a Clippy 2.0 plugin for improving the quality of online discussions :yum:


Except this plugin won’t provide you with an option, it will forcefully do it for you. Or it will just employ mechanical turk workers to check your every post for threats against trump! :scream:

Take a look at this!

“Perspective is an API that makes it easier to host better conversations. The API uses machine learning models to score the perceived impact a comment might have on a conversation. Developers and publishers can use this score to give realtime feedback to commenters or help moderators do their job, or allow readers to more easily find relevant information, as illustrated in two experiments below. We’ll be releasing more machine learning models later in the year, but our first model identifies whether a comment could be perceived as “toxic” to a discussion."

Yes that was already covered in my link above.

I applied for an invite to Perspective API and they shared some links that aren’t very easily accessible to the public yet, but they are public knowledge none the less:

@deevolution and @angus definitely keep us in the loop on your progress! It only took me a couple of days to get whitelisted for API access. If you’ve waited more than a week, let me know and I could provide a reference on behalf of Discourse.


jealous! I applied a few days ago, but haven’t heard anything back yet :frowning: I’ll comb through these links tonight. Thanks @erlend_sh!

Just saw this post on hackernews:

Their neural network on github:
Looks like they’ve trained a high accuracy sentiment nueral network model.

Maybe something can be pulled from this.

1 Like

They have an open Discourse forum, so you might be able to pitch Discourse to them as a suitable test subject for that library :wink:


This is a very interesting idea. I’ve been reading the Meta for some time since our community moved to Discourse. One of the issues the moderators run into is with a handful of people who are often OK contributors but can’t seem to stop themselves from adding a toxic comment on occasion. They are typically already at least level 2, some have even made it to 3, but they just have that tendency to type first and think later. It keeps some from ever being considered for level 3 and keeps the moderators busy watching this small group.

I finally signed up today since I keep reading but never post or comment, so I thought it was time to at least be able to get involved in discussion.


Awesome! That’s good to hear an actual customer use case. I definitely want
to move ahead with development soon (likely around may/june after I finish
up thesis). Thanks for chiming in. When we have a working version, I think
it would help a lot if we use your site for testing and feedback. I would
definitely be interested in getting some help/guidance on the project from

Would be happy to help out. I don’t have a lot of time right now, but hopefully will have a few spare moments here and there.

The plugin I’m building is purely infrastructure to support training, testing and using any model with any discourse instance. It’s meant to be model / framework agnostic (albeit it’s quite biased towards tensorflow atm). The biggest remaining ‘to do’ for an ‘mvp’ is to implement a docker network to properly retrieve the output from the container running the model. Currently the output is being parsed from a log stream.

Although, when I talked with @sam about this, he suggested putting it all in the same container, so perhaps a network is not necessary.

This is what the UI looks like:


I just posted under their Request for Research topic about considering Discourse as a training set.


Not to steer things too far from this excellent discussion of tools in progress, but I wonder if Mozilla’s Coral Project might be interested in maybe doing some work together, perhaps writing a plugin for their Talk community commenting product? They are already using Discourse and are obviously interested in this space:


Continued here: