Disorder - Automatic toxicity detection for your community

Do I understand correctly the disorder API instance should be launched to companion the plugin? There is a pre-filled setting disorder inference service api endpoint with https://disorder-testing.demo-by-discourse.com pre-set. Yet there is disorder inference service api key which is empty by default.

We are interested in giving this plugin a test as we face a lot of toxic behavior between users, which eventually gets resolved by flagging and help of Leaders, yet we would like to pro-actively prevent users from spreading negative posts if it’s possible, and this plugin seems to fit into such role.

Can we use any ready endpoint to give it a try? Fair warning, we have ~150k page views daily and it might clog up some unprepared servers.

We are standalone.

4 Likes

While you can run your own API server, the plugin comes pre-configured pointing to https://disorder-testing.demo-by-discourse.com/ so it works out of the box.

Please do use this endpoint for your instance, as it’s provided exactly for your use case of self-hosted instances wanting to give this plugin a try. In the default configuration, all the API calls happen in the background, so the API being down won’t impact your site in any way, so it’s safe to use it.

The api key setting is optional, and only needed if your API server have it enabled. The public instance at https://disorder-testing.demo-by-discourse.com/ doesn’t have it enabled.

6 Likes

Thank you! Sounds perfect and will give it a try in upcoming days :heart:

4 Likes

Are there other ML applications planned for the future?

2 Likes

I tried this for a week, and it was absurdly aggressive at flagging posts. I recommend using this only if you have a huge site without enough mods. Hope the AI gets better, but it’s just not there yet.

6 Likes

This is great feedback! Would you be willing to share some debugging stats to help me understand exactly what went down?

Something like the result of

SELECT
  pcf.value,
  p.raw
FROM
  post_custom_fields AS pcf
INNER JOIN
  posts AS p ON p.id = pcf.post_id
WHERE
  pcf.name = 'disorder'

here or in a PM would be immensely helpful.

5 Likes

Ahh yes, I forgot all about that! Here you go. There really weren’t that many, but they were just unnecessary and members and mods found them annoying. I also am unsure about it scanning DMs, I know there could be value there if someone is harassing someone via DM, but most of the time it’s just going to trigger people knowing that we’re looking at their DMs

1 Like

Do you use chat? Were all the annoying flags in posts / PMs ?

We do use chat but I’m pretty sure all the flags were in posts and pms

1 Like

First of all, I’m very grateful for both the feedback and the data you shared that allowed me to debug this further.

Now to my findings!

During this week, you had 1942 new posts from non-staff users. Quite an active community! However I would not say that the AI is " absurdly aggressive at flagging posts", as only 7 posts were flagged.

That said, of those 7, half are clearly false positives triggered by too low defaults thresholds, other half are trickier for AI to understand the context (calling your interlocutor a jerk vs telling a story about how someone was a jerk to you today while you were shopping) and one is, IMO, a correct hit.

If you are willing to give it another try, moving all the thresholds to 85 and moving to the original model may solve almost all trigger-happy flagging issues you had so far. I’ll add a site setting to allow skipping PMs as I can see how that can be annoying for some communities too.

8 Likes

Thanks Falco, I apologize for saying it was absurdly agressive. I had a lot of drama happening on the site already and the flagging just added to that and I was quite annoyed at the time.

I appreciate the suggestions and will give it another try. Question, what happens when you disable disorder flag automatically? Will I still be notified somehow if a post is deemed disorderly? This would be nice to test it out and figure out what settings work without having posts flagged.

4 Likes

Without that setting it will runs the posts against the AI but won’t take any actions. You can leave it like that and then run that Data Explorer query to do some analysis of the false positive/false negative rates.

There is also another setting that allows you to add groups to a skip list, where you could, for example, skip posts from TL4/3 from being classified. That may also help.

Dear @Falco,

We started testing Disorder out. The overall feedback is positive - it really does detect inappropriate things, while flagging a lot of things which our community accepts. Due to nature of the forum where we test this plugin (Adult), the communication involves several aspects which trigger Disorder to flag many many posts. Your SQL Query really does help checking out which thresholds to adjust, but may I suggest adding those to Reviewable Scoring table for each flagged post?

This one

I don’t know if it’s possible for a plugin to introduce it’s own data to this view, but it would help staff a lot to understand which criteria to adjust to reduce false-positive results for us. The way I see it is adding dropdown with a breakdown per criteria triggered within this view. No need to include criteria equaling 0. Those which are above 0, should be present there, but only those which exceed the current config thresholds should be marked bold/red.

Disorder Scoring example
  • Toxicity 65% [1]
  • Insult 73% [2]
  • Threat 12% [3]
  • Sexual explicit 2% [4]

If needed, I can provide you with SQL Query results. We are far from finishing reviewing Flag Queue…
We are using multilingual model and haven’t tried others. Decided it would be a good to start with considering we have some users who prefer posting using their original language.


  1. exceeding, red font ↩︎

  2. exceeding, red font ↩︎

  3. normal, normal font ↩︎

  4. normal, normal font ↩︎

1 Like

Hi again,

Wanted to let you know that we get Errors in logs related to Disorder using “original” model. I just switched it back to multilingual to see if it will make difference.

Job exception: undefined method `>=’ for nil:NilClass @classification[label] >= SiteSetting.send(“disorder_flag_threshold_#{label}”) ^^

Details

/var/www/discourse/plugins/disorder/lib/classifier.rb:39:in `block in consider_flagging’

/var/www/discourse/plugins/disorder/lib/classifier.rb:38:in `filter’

/var/www/discourse/plugins/disorder/lib/classifier.rb:38:in `consider_flagging’

/var/www/discourse/plugins/disorder/lib/classifier.rb:25:in `classify!’

/var/www/discourse/plugins/disorder/app/jobs/regular/classify_post.rb:14:in `execute’

/var/www/discourse/app/jobs/base.rb:249:in `block (2 levels) in perform’

rails_multisite-4.0.1/lib/rails_multisite/connection_management.rb:80:in with_connection' /var/www/discourse/app/jobs/base.rb:236:in block in perform’

/var/www/discourse/app/jobs/base.rb:232:in `each’

/var/www/discourse/app/jobs/base.rb:232:in `perform’

sidekiq-6.5.8/lib/sidekiq/processor.rb:202:in `execute_job’

sidekiq-6.5.8/lib/sidekiq/processor.rb:170:in `block (2 levels) in process’

sidekiq-6.5.8/lib/sidekiq/middleware/chain.rb:177:in `block in invoke’

/var/www/discourse/lib/sidekiq/pausable.rb:134:in `call’

sidekiq-6.5.8/lib/sidekiq/middleware/chain.rb:179:in `block in invoke’

sidekiq-6.5.8/lib/sidekiq/middleware/chain.rb:182:in `invoke’

sidekiq-6.5.8/lib/sidekiq/processor.rb:169:in `block in process’

sidekiq-6.5.8/lib/sidekiq/processor.rb:136:in `block (6 levels) in dispatch’

sidekiq-6.5.8/lib/sidekiq/job_retry.rb:113:in `local’

sidekiq-6.5.8/lib/sidekiq/processor.rb:135:in `block (5 levels) in dispatch’

sidekiq-6.5.8/lib/sidekiq.rb:44:in `block in module:Sidekiq

sidekiq-6.5.8/lib/sidekiq/processor.rb:131:in `block (4 levels) in dispatch’

sidekiq-6.5.8/lib/sidekiq/processor.rb:263:in `stats’

sidekiq-6.5.8/lib/sidekiq/processor.rb:126:in `block (3 levels) in dispatch’

sidekiq-6.5.8/lib/sidekiq/job_logger.rb:13:in `call’

sidekiq-6.5.8/lib/sidekiq/processor.rb:125:in `block (2 levels) in dispatch’

sidekiq-6.5.8/lib/sidekiq/job_retry.rb:80:in `global’

sidekiq-6.5.8/lib/sidekiq/processor.rb:124:in `block in dispatch’

sidekiq-6.5.8/lib/sidekiq/job_logger.rb:39:in `prepare’

sidekiq-6.5.8/lib/sidekiq/processor.rb:123:in `dispatch’

sidekiq-6.5.8/lib/sidekiq/processor.rb:168:in `process’

sidekiq-6.5.8/lib/sidekiq/processor.rb:78:in `process_one’

sidekiq-6.5.8/lib/sidekiq/processor.rb:68:in `run’

sidekiq-6.5.8/lib/sidekiq/component.rb:8:in `watchdog’

sidekiq-6.5.8/lib/sidekiq/component.rb:17:in `block in safe_thread’

Details 2
hostname
process_id 65460
application_version 2f8ad17aed81bbfa2fd20b6cc9210be92779bd74
current_db default
current_hostname
job Jobs::ClassifyPost
problem_db default
time 1:52 pm
opts
post_id 604063
current_site_id default

P.S. Yes, multilingual mode does not produce these errors. Unbiased model does not produce errors either

1 Like

I have also modified your query to display scoring in a more convenient way using Data Explorer.
Credits go to ChatGPT and PostgreSQL clues by Leonardo:

SELECT
  json_extract_path_text(pcf.value::json, 'classification', 'toxicity') AS toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'severe_toxicity') AS severe_toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'obscene') AS obscene,
  json_extract_path_text(pcf.value::json, 'classification', 'identity_attack') AS identity_attack,
  json_extract_path_text(pcf.value::json, 'classification', 'insult') AS insult,
  json_extract_path_text(pcf.value::json, 'classification', 'threat') AS threat,
  json_extract_path_text(pcf.value::json, 'classification', 'sexual_explicit') AS sexual_explicit,
  json_extract_path_text(pcf.value::json, 'model') AS model,
  pcf.created_at,
  p.raw
FROM
  post_custom_fields AS pcf
INNER JOIN
  posts AS p ON p.id = pcf.post_id
INNER JOIN
  topics AS t ON t.id = p.topic_id
WHERE
  pcf.name = 'disorder' 
  AND t.archetype = 'regular'
ORDER BY created_at DESC
And this modification will return those rows, where any of classification values is bigger than 50 (or whatever you set)
-- [params]
-- int :threshold = 50
SELECT DISTINCT ON (p.id, pcf.created_at)
  json_extract_path_text(pcf.value::json, 'classification', 'toxicity') AS toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'severe_toxicity') AS severe_toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'obscene') AS obscene,
  json_extract_path_text(pcf.value::json, 'classification', 'identity_attack') AS identity_attack,
  json_extract_path_text(pcf.value::json, 'classification', 'insult') AS insult,
  json_extract_path_text(pcf.value::json, 'classification', 'threat') AS threat,
  json_extract_path_text(pcf.value::json, 'classification', 'sexual_explicit') AS sexual_explicit,
  json_extract_path_text(pcf.value::json, 'model') AS model,
  p.id as post_id,
  pcf.created_at,
  p.raw
FROM
  post_custom_fields AS pcf
INNER JOIN
  posts AS p ON p.id = pcf.post_id
INNER JOIN
  topics AS t ON t.id = p.topic_id
WHERE
  pcf.name = 'disorder' 
  AND t.archetype = 'regular'
GROUP BY p.id, pcf.value, pcf.created_at
HAVING 
  CAST(json_extract_path_text(pcf.value::json, 'classification', 'toxicity') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'severe_toxicity') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'obscene') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'identity_attack') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'insult') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'threat') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'sexual_explicit') AS FLOAT) > :threshold
ORDER BY pcf.created_at DESC, p.id

You can also modify it by introducing several more parameters to be able to set different thresholds to report on using Data explorer.

Please note: this will return Public posts only, without accessing private messages.

3 Likes

We are working on this exact feature right now!

We are also planning on using the false positive / negative rates to run an optimizer that can suggest you the best thresholds for each option, so keep that information as it will be useful in the near future.

5 Likes

Sounds great. Glad to hear that.
So far, I tend to decline/ignore all the flags Disorderbot makes, even having thresholds raised up to maximum of 90-100. But, due to the nature of the forum we’re testing it on (NSFW), AI is confused easily if communication is really toxic or not. As long as it is not that reliable to our use case, we will continue using it, but will use it’s reports only to “re-inforce” other reports to really toxic posts.

As soon as we find some better thresholds to use for a long-term, we will be able to enable precautionary warnings when user tries to post something really toxic.

That’s what I suspect when AI becomes mainstream. It will allow censorship and limit genuine status-quo questioning that’s neccessary for the healty of every community on the world.

Not limit or ban, educate and discuss. Perhaps there is a way to use the tools without the side-effect (as my concerns that’s the wanted effect) but I see that’s not possible at the moment.

Thanks for your feedback, it has value for me. And of course, thanks to the team for keeping Discourse updated and improving like always :slight_smile:

Setting all thresholds to 100 and relying only on the more extreme ones, like “severe toxicity” and “threat”, is something that I can see being adopted in communities like that.

3 Likes

Thanks. It is currently set like this, and is still too sensitive. I will raise some even further and see how it goes

1 Like