Discourse Etiquette: take actions when users intend to post inappropriate remark

official

(Erick Guan) #1

A discourse plugin that can help you combating with inappropriate content. It requires a Google Perspective API key (experimental and with limited availability). The plugin can:

  • flag user’s post if it’s toxic.
  • show a notification when they type something toxic.

Install the plugin by following the howto.

Get a Google Perspective API key

You have to apply here. They will send you further instructions.

Settings

The plugin send users’ post to Google’s API which will decide to take actions based on a “confidence level”. You can choose the model and threshold in site settings. The threshold you should use depends on the model.

  1. choose etiquette_toxicity_model. The standard model is the default and other settings are tuned based on it. Google also offers severe toxicity (experimental) model. This model handles more severe content. For example, if you choose a threshold of 0.9, standard model won’t take too much false actions. While for severe toxicity model, the threshold should be lower to 0.7 to catch most inappropriate content.

  2. etiquette_flag_post_min_toxicity_confidence: if API returns a score higher than this threshold, we flag the post. Can be set lower till 0.7 if you use severe toxicity (experimental) model.

  3. etiquette_notify_posting_min_toxicity_confidence: same as above. It’s set to be lower since we want users have chance to make changes to their post. Can be set lower till 0.65 if you use severe toxicity (experimental) model.

For more information about confidence levels and other technical details, see perspectiveapi/api_reference.md at master · conversationai/perspectiveapi · GitHub.

The plugin is still experimental and subject to change. And it’s currently only available for the English language.

Old discussion can be found here.


Live Demo

If you are interested in the plugin, try it on a test site as

username: toxic-user
password: ThisPersonIsToxic

Evaluating Google's Perspective API on your Discourse forum
Evaluating Google's Perspective API on your Discourse forum
(Sam Saffron) #2

Can you expand a bit on the technical side

  • how often and when is stuff sent to google
  • does this eat up a unicorn while it is talking to google?

Also can you add screenshots of this in action?


(Chris Beach) #3

Very interesting concept. Just got my API key (minutes after requesting it), and I look forward to trialing this plugin.


(Erick Guan) #4

When user types something, it checks around 1 second if you change something. Moreover, it sends the post to Google when every post is created.

And unfortunately it takes a unicorn process now.


If you are interested in the plugin, try it on a test site as

username: toxic-user
password: ThisPersonIsToxic

(Sam Saffron) #5

what rate limits do google have? You got to start there.

I would strongly recommend winding this way down. Maximum send it to Google every 10 seconds.

I almost feel like this should only happen at the end of the process prior to posting, cause this is a lot of traffic to Google.

When you send to google always use hijack like uploads_controller does, its a 2 line change. Also add a rate limit server side so a user can never do this more than 6 times a minute or something.


(Erick Guan) #6

The quota is around 1000 per 100s.

I think there are two approaches here. The first would be that we query at a slower rate; The other would be blocking for a few seconds and stop posting if found something. But that doesn’t sound nice.


(Kane York) #7

But that’s per API key, we’re talking 1/10sec per user. Client side throttling needs to be multiplied by the number of concurrent clients you have.


(Erick Guan) #8

Now rate limit and rack hijack are in place.

That’s right. But it still depends on the scale of community. We could afford to wait a few seconds for this API code. Apart from API quota, I believe it’s ok to let those JIT notification failed or at a much slower rate. In particular, I think the score from Google will only change a lot if a strong word is found or much of content has been changed.


(Erlend Sogge Heggen) #9

Third option would be to let the post through without blocking it, but if the post has a high toxicity score we send the user a follow-up PM urging them to edit their post.

@eviltrout or @neil could you speak to how we minimise API calls in the Akismet Plugin? Do we send all new posts to Akismet without exception, or do we stop screening posts of TL1 or TL2 users?

p.s. Erick is away for a week.


(Neil Lalonde) #10

@erlend_sh We don’t send the post contents as you’re typing. It happens after the post is submitted, same as discourse-etiquette is doing to flag posts. Akismet can be limited by trust level.

The “block” action of watched words will happen when submitting a post, returning a message to the composer and giving them a chance to edit the post and submit again. Maybe etiquette can do the same?

It will use a unicorn to do that though.


(Kris) #11

I think going this way could have more of an impact too… it’s a little harder to ignore. We could say something like:


(Sam Saffron) #12

Cross post from other topic:


(Erick Guan) #13

Watch words are limited and Perspective API can definitely tackle on more cases. I’d prefer this approach which also works on mobile.

It’s also a good idea to try. I think PAPI is evolving rapidly. We should anticipate that when we plan this. PAPI returns a summary score for all content and individual scores for paragraphs. They might give the reasons for individual scores.


(Chris Beach) #14

I’ve installed and am keen to see how the plugin performs with user posts. With the thresholds set to the defaults I find I’m getting some odd false positives when creating new topics:

I noticed some errors popping up in the logs:

Error logs (click to expand)

I’ll disable for now - please let me know if you get to the bottom of the problems, or if I can help with further debugging information.


(Erick Guan) #15

Can I have the log in private? Also, did you manage to hit Google’s quota?


(Chris Beach) #16

I don’t think I hit the quota, no. I’ll PM you what I can from the logs


(Diego Barreiro) #17

Any idea how much time does Google take to reply to an API Key request? :sweat_smile:


(Erlend Sogge Heggen) #18

We’ll be going with @awesomerobot’s suggestion for the next iteration since continuously checking for toxic content whilst writing incurs too many API calls.

Just as food for thought, I also think more specific feedback on what specifically was wrong with your comment could be helpful (shouldn’t actually be part of the JIT interface, just wanted to demonstrate how only the relevant sentences are brought in for the “review”). I wonder if the API would let us make a more detailed report like this:

Are you still waiting. It seems @ChrisBeach got his accepted pretty quickly. I believe the queue is human-curated and done in bursts so it can sometimes be stalled for quite some time.


(Chris Beach) #19

With the latest version from @fantasticfears it looks like auto flagging is working correctly. I’ll test the user feedback mechanism on various platforms and let you know if I have any issues. Thanks @fantasticfears :+1:


(Chris Beach) #20

@fantasticfears - unfortunately even with the following settings:

I’m still getting false positives: