Pre Emptive Striker plugin development log

(Gentry Demchak) #1

Hey, all

As suggested by @erlend_sh, I have created a development log for a plugin I’ve begun developing for Discourse.

I have started work on building what I call the pre-emptive striker plugin (if anyone has a better suggestion for the name, throw it at me!). It was proposed at the beginning of the summer and I’m now officially developing the plugin. Link to original proposal is here.
Basically, the plugin checks on the user as they are writing for toxicity using Google’s perspective API. and send JIT notifications if it detects particularly toxic language.

I would like some help, however, in understanding how this plugin will interact with the rest of Discourse. I created a quick to illustrate the architecture of the plugin as I understand it in the context of Discourse. Discourse is pretty big to dive into as my first ruby app. I’ve gone through all of the Discourse beginner guides to familiarize my self a bit… Link to the file Please download and make edits or point out anything that looks wrong! This is a fairly high-level diagram.

basic user flow goes like this:

  1. user begins composing.
  2. They type “You’re a dumb person for making that comment, why do you even exist?”
  3. As they type, Perspective API is sent requests to analyze the comments. It will return a high toxicity score.
  4. The high toxicity score is over a threshold (probably a threshold set in the plugin setting admin panel?) and thus triggers a JIT notification.
  5. The user sees the warning, but ignores it and posts anyways! (or they reflect on what they just wrote and proceed to cry because they’ve realized what a terrible person they are…)
  6. If they post it, they will be automatically flagged for moderation follow-up.
  7. A moderator will review the post and do whatever moderators do best.

Is there documentation on the API’s for JIT notifications and moderation flagging?
Any pointers/guidance there would be greatly appreciated and would speed development up.

I wrote a quick node app that interfaces with PAPI. Next steps are to begin writing it in ruby.

The plugin repo is here for anyone who wants to have a look. It’s pretty bare bones at the moment.


Discourse Etiquette: take actions when users intend to post inappropriate remark
Alibaba and Microsoft AI beat humans in Stanford reading test (future of Q&A, customer support forums)
Auto-checking quality of language in posts?
Pre-emptively warning a contributor about the toxicity of their post
Additional, optional barriers before commenting on an article?
(Kane York) #2

There are significant problems with the Perspectives API. Punctuation and several key words can be sprinkled through your message to lower the rating near zero.

I would not feel comfortable using it with Discourse.

EDIT: Examples from earlier this month: Maik Macho on Twitter: "@0xabad1dea I've had a Discord bot using the Perspective API for fun... and we found out that currently, "lively" is the most un-toxic word."

This has since changed, but “I am a gay black woman” would still be filtered.

(Erlend Sogge Heggen) #3

Keep in mind:

  1. It will be a plugin

  2. Its only purpose is to act as a signalling helper for users as well as moderators. It should never block regular use in any way. Worst possible outcome for a false positive: You receive an unwarranted warning. Or on the flip side, you don’t receive a warning when you should have.

  3. Perspective API is still in very active development and the team at Google welcomes any feedback that can help improve it. I’m sure they’ll appreciate it if you drop a note about your findings to

I for one find this to be a very exciting area of development for us. Our company is called “Civilized Discourse Construction Kit” for a reason.

Toxic behaviour on the internet is a huge problem. Machine learning and clever programming in general isn’t going to solve all of our problems, but every little bit helps. As long as we’re not negatively impacting the experience of good actors (CAPTCHAs are a good example of a spam-deterrent that brings regular users a lot of grief) I welcome any kind of experiment in automagic moderation.

(Régis Hanol) #4

I would first write a plugin which listens to the post_created event, sends a request to the Perspective API and flags the post if it’s toxic.

Once that is done and working, I would add support for a new JIT message.

(Sam Saffron) #5

I would be super careful though not to block the creation pipeline with a remote call, my recommendation would be to look at the akismet plugin and follow the same pattern it follows

(Gentry Demchak) #6

I totally understand that concern, it is totally valid. I think PAPI has long ways to go as well, but I believe it will get better as time goes on. I don’t see this plugin as something that replaces moderators, rather I see it as augmenting their capabilities and reducing parts of their workload.

(Angus McLeod) #7

When reading the OP, my thought was that it would be better to use the perspective API to warn the user, rather than auto-flag. Given @riking’s points about the current state of the API, using to auto-flag could result in unnecessary work for the moderators. Using it to warn the user on the other hand could help to prevent the user posting something toxic in the first place. Prevention is better than a cure, as they say.

One way to do that without adding any new server calls would be to use the existing ‘draft’ mechanism. The composer is already sending the raw text of a post being composed to the server every 2 seconds. Find a way to hook into that process on the server and run the PAPI on the draft text in a separate process. Only if the score meets the threshold would you send a message to the client to display a warning message.

(Gentry Demchak) #8

That’s perfect! Thanks for pointing that out. I believe I have found the draft mechanism under jsapp/models/draft.js.es6. Looks like there are a get and a get_local method - both require a key. My question now is should I call that method every 2 seconds or is there an event broadcaster that I can listen to? I guess that would probably require WebSockets.

I’ll download and install the akismet plugin today and have a look around and see how they get comments from the composer and process it.

I need to:

  1. Determine when the user has started a new composition or opened a saved draft
  2. poll their draft every 2 seconds
  3. analyze the draft with PAPI
  4. use JIT notifications to warn the user about toxicity
  5. determine when the user has deleted their draft or published it so that the plugin can stop polling the composer.

(Erick Guan) #9

I’m working on a plugin for this. PAPI returns a confidence value. If it’s set correctly, the result should be trustworthy. As a result, for auto-flag feature, false negative will be few.

The poster gets flagged upon posting a toxic comment

System calls it out as “Highly likely a toxic post”

The source code is available here. GitHub - fantasticfears/discourse-etiquette

You can try this here.

username: toxic-user
password: ThisPersonIsToxic

You can cross check writing demo on Perspective API.
Also noted, auto flag is set to 0.7 toxicity confidence (quite low) for playing. As the model takes > 0.8 as can be perceived as toxic.

Now I’m working on JIT notification.

(Stephen Chung) #10

If it is a topic, will it automatically queue it up for moderation when toxicity is detected?

(Erick Guan) #11

No, it simply gets flagged.

The plugin is updated. Here comes the notification when editing.

Several limitations:

  • looks like there is deadlock problem when requesting API. I’d be happy if you can break down the test site when testing.
  • it’s only available for English site.
  • it doesn’t say which part of user’s post has the problem. I think it’ll will take some more work for this feature.

(Stephen Chung) #12

Will it be difficult to have an option to put it into the moderation queue instead of / in addition to flagging?

(Joffrey Jaffeux) #13

maybe use the same validation UI than title length for example? Not sure this one will catch the eye.

(Erlend Sogge Heggen) #14

Probably not difficult, we can keep it in mind for future development. The Perspective API is still a WIP so flagging is the safest moderation action to start with.