Disorder - Automatic toxicity detection for your community

I have also modified your query to display scoring in a more convenient way using Data Explorer.
Credits go to ChatGPT and PostgreSQL clues by Leonardo:

SELECT
  json_extract_path_text(pcf.value::json, 'classification', 'toxicity') AS toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'severe_toxicity') AS severe_toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'obscene') AS obscene,
  json_extract_path_text(pcf.value::json, 'classification', 'identity_attack') AS identity_attack,
  json_extract_path_text(pcf.value::json, 'classification', 'insult') AS insult,
  json_extract_path_text(pcf.value::json, 'classification', 'threat') AS threat,
  json_extract_path_text(pcf.value::json, 'classification', 'sexual_explicit') AS sexual_explicit,
  json_extract_path_text(pcf.value::json, 'model') AS model,
  pcf.created_at,
  p.raw
FROM
  post_custom_fields AS pcf
INNER JOIN
  posts AS p ON p.id = pcf.post_id
INNER JOIN
  topics AS t ON t.id = p.topic_id
WHERE
  pcf.name = 'disorder' 
  AND t.archetype = 'regular'
ORDER BY created_at DESC
And this modification will return those rows, where any of classification values is bigger than 50 (or whatever you set)
-- [params]
-- int :threshold = 50
SELECT DISTINCT ON (p.id, pcf.created_at)
  json_extract_path_text(pcf.value::json, 'classification', 'toxicity') AS toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'severe_toxicity') AS severe_toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'obscene') AS obscene,
  json_extract_path_text(pcf.value::json, 'classification', 'identity_attack') AS identity_attack,
  json_extract_path_text(pcf.value::json, 'classification', 'insult') AS insult,
  json_extract_path_text(pcf.value::json, 'classification', 'threat') AS threat,
  json_extract_path_text(pcf.value::json, 'classification', 'sexual_explicit') AS sexual_explicit,
  json_extract_path_text(pcf.value::json, 'model') AS model,
  p.id as post_id,
  pcf.created_at,
  p.raw
FROM
  post_custom_fields AS pcf
INNER JOIN
  posts AS p ON p.id = pcf.post_id
INNER JOIN
  topics AS t ON t.id = p.topic_id
WHERE
  pcf.name = 'disorder' 
  AND t.archetype = 'regular'
GROUP BY p.id, pcf.value, pcf.created_at
HAVING 
  CAST(json_extract_path_text(pcf.value::json, 'classification', 'toxicity') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'severe_toxicity') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'obscene') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'identity_attack') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'insult') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'threat') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'sexual_explicit') AS FLOAT) > :threshold
ORDER BY pcf.created_at DESC, p.id

You can also modify it by introducing several more parameters to be able to set different thresholds to report on using Data explorer.

Please note: this will return Public posts only, without accessing private messages.

3 Likes

We are working on this exact feature right now!

We are also planning on using the false positive / negative rates to run an optimizer that can suggest you the best thresholds for each option, so keep that information as it will be useful in the near future.

5 Likes

Sounds great. Glad to hear that.
So far, I tend to decline/ignore all the flags Disorderbot makes, even having thresholds raised up to maximum of 90-100. But, due to the nature of the forum we’re testing it on (NSFW), AI is confused easily if communication is really toxic or not. As long as it is not that reliable to our use case, we will continue using it, but will use it’s reports only to “re-inforce” other reports to really toxic posts.

As soon as we find some better thresholds to use for a long-term, we will be able to enable precautionary warnings when user tries to post something really toxic.

That’s what I suspect when AI becomes mainstream. It will allow censorship and limit genuine status-quo questioning that’s neccessary for the healty of every community on the world.

Not limit or ban, educate and discuss. Perhaps there is a way to use the tools without the side-effect (as my concerns that’s the wanted effect) but I see that’s not possible at the moment.

Thanks for your feedback, it has value for me. And of course, thanks to the team for keeping Discourse updated and improving like always :slight_smile:

Setting all thresholds to 100 and relying only on the more extreme ones, like “severe toxicity” and “threat”, is something that I can see being adopted in communities like that.

3 Likes

Thanks. It is currently set like this, and is still too sensitive. I will raise some even further and see how it goes

1 Like

Would have to see the raw classifications, but I’d increase the insult one first too.

I’d better keep you away from reading those :smiley: Those may be really NSFW, even in text form
I’ve raised the first threshold to 100 too, will see how it goes now :smiley:

1 Like

I really hope to make it possible for Disorder not to check (or not to report) on private messages in the future versions. We do not access them and feel like AI checking private conversations is highly unethical.

4 Likes

Yeah, that is the same thing @davidkingham asked, we will put it in our roadmap.

3 Likes

…and English? :sweat_smile:

Also, I’m wondering to what degree this can replace Akismet. We’re at a 97% disagree rate on Akismet’s flags right now. It seems to simply react to posts with a lot of digits in them, so if you’re posting job logs, where every line starts with a timestamp…

1 Like

The arms war between spam and spam detection just turned went nuclear with the advent of widely available LLMs. We are hard at work on features using a wide range of models, and while spam isn’t our priority right now, it’s something we will investigate.

4 Likes

Okay, so: I turned it on. How do I know it’s working?

Other than turning the thresholds down really low to catch everything, I mean.

Is there a diagnostic mode or log where I can see what a given post has scored?

2 Likes

The easiest way is to provoke it by posting something insulting. Make sure your user’s group is not bypassed in plugin settings.

The better way is to query Data Explorer. Please refer to one of my queries in this post:

1 Like

Thanks. That’s returning 0s across the board for all posts so far… is that to be expected?

1 Like

The majority of our posts have 0s across all the criteria too. This is normal for a forum with a healthy communication.

2 Likes

Cool — I wasn’t sure how trigger-happy the model is. :slight_smile:

1 Like

I installed the plugin, but it is not working, do I have to do some extra configuration?

I’m seeing a large number of the following errors from the plugin:
Job exception: uninitialized constant Jobs::ClassifyChatMessage::ChatMessage

The issue appears to occur when one of my plugins creates a chat message using the following command:
Chat::MessageCreator.create(chat_channel: matching_channel, user: message_user, content: raw).chat_message

Thanks

1 Like

Ohhh this should have broken with the new Chat reorganization. We are on the verge of launching a new plugin that will incorporate the functionality of this one in the next days, so stay tuned.

5 Likes