Discourse Disorder

kinetiksoft · March 1, 2023, 12:45pm

I have also modified your query to display scoring in a more convenient way using Data Explorer.
Credits go to ChatGPT and PostgreSQL clues by Leonardo:

SELECT
  json_extract_path_text(pcf.value::json, 'classification', 'toxicity') AS toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'severe_toxicity') AS severe_toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'obscene') AS obscene,
  json_extract_path_text(pcf.value::json, 'classification', 'identity_attack') AS identity_attack,
  json_extract_path_text(pcf.value::json, 'classification', 'insult') AS insult,
  json_extract_path_text(pcf.value::json, 'classification', 'threat') AS threat,
  json_extract_path_text(pcf.value::json, 'classification', 'sexual_explicit') AS sexual_explicit,
  json_extract_path_text(pcf.value::json, 'model') AS model,
  pcf.created_at,
  p.raw
FROM
  post_custom_fields AS pcf
INNER JOIN
  posts AS p ON p.id = pcf.post_id
INNER JOIN
  topics AS t ON t.id = p.topic_id
WHERE
  pcf.name = 'disorder' 
  AND t.archetype = 'regular'
ORDER BY created_at DESC

And this modification will return those rows, where any of classification values is bigger than 50 (or whatever you set)

-- [params]
-- int :threshold = 50
SELECT DISTINCT ON (p.id, pcf.created_at)
  json_extract_path_text(pcf.value::json, 'classification', 'toxicity') AS toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'severe_toxicity') AS severe_toxicity,
  json_extract_path_text(pcf.value::json, 'classification', 'obscene') AS obscene,
  json_extract_path_text(pcf.value::json, 'classification', 'identity_attack') AS identity_attack,
  json_extract_path_text(pcf.value::json, 'classification', 'insult') AS insult,
  json_extract_path_text(pcf.value::json, 'classification', 'threat') AS threat,
  json_extract_path_text(pcf.value::json, 'classification', 'sexual_explicit') AS sexual_explicit,
  json_extract_path_text(pcf.value::json, 'model') AS model,
  p.id as post_id,
  pcf.created_at,
  p.raw
FROM
  post_custom_fields AS pcf
INNER JOIN
  posts AS p ON p.id = pcf.post_id
INNER JOIN
  topics AS t ON t.id = p.topic_id
WHERE
  pcf.name = 'disorder' 
  AND t.archetype = 'regular'
GROUP BY p.id, pcf.value, pcf.created_at
HAVING 
  CAST(json_extract_path_text(pcf.value::json, 'classification', 'toxicity') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'severe_toxicity') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'obscene') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'identity_attack') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'insult') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'threat') AS FLOAT) > :threshold 
  OR CAST(json_extract_path_text(pcf.value::json, 'classification', 'sexual_explicit') AS FLOAT) > :threshold
ORDER BY pcf.created_at DESC, p.id

You can also modify it by introducing several more parameters to be able to set different thresholds to report on using Data explorer.

Please note: this will return Public posts only, without accessing private messages.

Falco · March 1, 2023, 4:33pm

We are working on this exact feature right now!

We are also planning on using the false positive / negative rates to run an optimizer that can suggest you the best thresholds for each option, so keep that information as it will be useful in the near future.

kinetiksoft · March 1, 2023, 4:47pm

Sounds great. Glad to hear that.
So far, I tend to decline/ignore all the flags Disorderbot makes, even having thresholds raised up to maximum of 90-100. But, due to the nature of the forum we’re testing it on (NSFW), AI is confused easily if communication is really toxic or not. As long as it is not that reliable to our use case, we will continue using it, but will use it’s reports only to “re-inforce” other reports to really toxic posts.

As soon as we find some better thresholds to use for a long-term, we will be able to enable precautionary warnings when user tries to post something really toxic.

satonotdead · March 1, 2023, 5:01pm

That’s what I suspect when AI becomes mainstream. It will allow censorship and limit genuine status-quo questioning that’s neccessary for the healty of every community on the world.

Not limit or ban, educate and discuss. Perhaps there is a way to use the tools without the side-effect (as my concerns that’s the wanted effect) but I see that’s not possible at the moment.

Thanks for your feedback, it has value for me. And of course, thanks to the team for keeping Discourse updated and improving like always

Falco · March 1, 2023, 5:50pm

Setting all thresholds to 100 and relying only on the more extreme ones, like “severe toxicity” and “threat”, is something that I can see being adopted in communities like that.

kinetiksoft · March 1, 2023, 5:56pm

Thanks. It is currently set like this, and is still too sensitive. I will raise some even further and see how it goes

Falco · March 1, 2023, 5:57pm

Would have to see the raw classifications, but I’d increase the insult one first too.

kinetiksoft · March 1, 2023, 6:09pm

I’d better keep you away from reading those Those may be really NSFW, even in text form
I’ve raised the first threshold to 100 too, will see how it goes now

kinetiksoft · March 1, 2023, 7:00pm

I really hope to make it possible for Disorder not to check (or not to report) on private messages in the future versions. We do not access them and feel like AI checking private conversations is highly unethical.

Falco · March 1, 2023, 7:11pm

Yeah, that is the same thing @davidkingham asked, we will put it in our roadmap.

ganncamp · March 1, 2023, 7:28pm

…and English?

Also, I’m wondering to what degree this can replace Akismet. We’re at a 97% disagree rate on Akismet’s flags right now. It seems to simply react to posts with a lot of digits in them, so if you’re posting job logs, where every line starts with a timestamp…

Falco · March 1, 2023, 7:35pm

The arms war between spam and spam detection just turned went nuclear with the advent of widely available LLMs. We are hard at work on features using a wide range of models, and while spam isn’t our priority right now, it’s something we will investigate.

mattdm · March 2, 2023, 4:34pm

Okay, so: I turned it on. How do I know it’s working?

Other than turning the thresholds down really low to catch everything, I mean.

Is there a diagnostic mode or log where I can see what a given post has scored?

kinetiksoft · March 2, 2023, 4:41pm

The easiest way is to provoke it by posting something insulting. Make sure your user’s group is not bypassed in plugin settings.

The better way is to query Data Explorer. Please refer to one of my queries in this post:

mattdm · March 2, 2023, 4:46pm

Thanks. That’s returning 0s across the board for all posts so far… is that to be expected?

kinetiksoft · March 2, 2023, 4:48pm

The majority of our posts have 0s across all the criteria too. This is normal for a forum with a healthy communication.

mattdm · March 2, 2023, 4:49pm

Cool — I wasn’t sure how trigger-happy the model is.

danielabc · April 15, 2023, 5:08am

I installed the plugin, but it is not working, do I have to do some extra configuration?

mjr4684 · April 20, 2023, 5:44pm

I’m seeing a large number of the following errors from the plugin:
Job exception: uninitialized constant Jobs::ClassifyChatMessage::ChatMessage

The issue appears to occur when one of my plugins creates a chat message using the following command:
Chat::MessageCreator.create(chat_channel: matching_channel, user: message_user, content: raw).chat_message

Thanks

Falco · April 20, 2023, 6:06pm

Ohhh this should have broken with the new Chat reorganization. We are on the verge of launching a new plugin that will incorporate the functionality of this one in the next days, so stay tuned.

Topic		Replies	Views
Setting up toxicity detection in your community Site Management moderation , automation , how-to , ai	0	667	August 7, 2024
Have AI check for inappropriate post or at least words and flag the post Support ai , ai-toxicity	3	370	July 7, 2023
Introducing Discourse AI Blog	26	3551	May 4, 2023
Discourse AI Plugin official , ai	66	34206	July 11, 2025
Experiments with AI based moderation on Discourse Meta Community moderation , ai	11	532	May 26, 2025

Discourse Disorder

Related topics