Evaluating Google's Perspective API on your Discourse forum


(Chris Beach) #1

Continuing the discussion from Discourse Etiquette: take actions when users intend to post inappropriate remark:

Google’s Perspective API is an experimental machine learning project, trained by moderation data from New York Times.

It’s designed to analyse online comments and highlight anything toxic.

I built an app that loads all posts from a Discourse forum backup and processes them using the Perspective API (instructions in README.md):

The results are stored in a DB for analysis.

When run on my forum, the “most toxic posters” list corresponded well with members our moderators are keeping an eye on. Aside from “toxicity” the measures of “inflammatory”, “attack on author”, “attack on commenter” were also reasonably accurate - at least enough to highlight the most problematic behaviour on the forum.

There were one or two anomalies. For example, one of our “most toxic” members turned out to be a wonderfully kind chap who was one of our forum’s earliest joiners and freely donated money to help with our running costs. Why so toxic? It turns out that he signs off posts with his nickname, “Dick,” and as far as the Perspective API is concerned, that’s rather uncivil. :slight_smile:

Please give my app a try on your forum and let me know how you get on. I’d be grateful for any PRs and/or feedback on the code as I’m new to Postgres, Akka-HTTP, Doobie, Docker and Spray-JSON. Cheers!


(Kane York) #2

That’s a fairly major false positive there!


(Mittineague) #3

Is the word list biased towards American cultural slang? It is not uncommon that a word that is perfectly innocent in one country has a not-so-innocent connotation in another.

Then there’s context too. “Dick” as a given name vs. as an adjective is a perfect example of this.

There’s also “privilege”. Meaning for example, if a gay wants to identify themselves as a gay, that would be self-descriptive not derogatory.

That said, as long as it’s more of a “this should be reviewed” and not a “auto-ban” thing it sounds promising.


(Matt Palmer) #4

Given that the system was

I’d say it’s extremely likely that it is very biased towards both American idioms (and spelling), and also the kinds of things you get in an online news comments section, as opposed to any other form of online communication.


(Chris Beach) #5

Yes it is.

I used the app to help decide whether installing the discourse-etiquette plugin was a good idea yet. I concluded that the flagging feature of the plugin would probably be useful to mods, but the automated feedback to users might be risky.

Here are the stats by category on my forum - seems mods urgently need to check out “2016 Parents” (an opt-in category) and work out what’s going wrong there!

As I imagined, most of the toxicity happens in our ethics and politics categories.

Here are the overall scores for SE23.life

Attack on Author 0.1
Attack on Commenter 0.19
Incoherent 0.505
Inflammatory 0.175
Likely to Reject 0.412
Obscene 0.165
Severe Toxicity 0.045
Spam 0.382
Toxicity 0.116
Unsubstantial 0.41

I’d be interested to hear how other forums compare


#6

Fantastically interesting subject, thank you for the hard work on the app.

I think this is surely one of the biggest issues in social networking.

Being from the UK I’ve seen a tremendously poor level of debate around Brexit. We need to encourage people to debate with civility and not constantly resort to personal attacks. People need to focus on the subject of debate and stop being cheap.

I wonder if this is something that could lead to ‘self reflection’ if available online within Discourse:

  • in your own profile or
  • as part of the submission preview or
  • by a flag next to your post viewable by just you or a moderator …

all as a means of judging your own ‘toxicity’ metric, to give you a pre-warning of you becoming too heated in your debating style? Even as a moderator/admin, I find myself going back to edit posts to reduce the temperature of some of my statements. As moderator/admin you clearly have a responsibility and need to set a good example so this is even more critical (though of course I think one really knows when one’s truly crossed the line)


(Erlend Sogge Heggen) #7

Haha, wonderful find! And great work all around. I’ve already forwarded this to the Google team, who can be emailed at conversationai-questions@google.com.

For those interested in reading more about Perspective’s models and biases, here’s some recommended reading:

https://blog.coralproject.net/toxic-avenging/

So far, Perspective API has received a mixed response. The first release, while intended to be an early test version of their approach, seemed to have several serious deficits. Experiments by the interaction designer Caroline Sinders (who has also done work with us) suggested that it was missing some key areas of focus, while Violet Blue, writing for Engadget, used Jessamyn West’s investigation among others to show that the system was returning some truly troubling false positives. For their part, Jigsaw says it is aware of these issues, and recently wrote a blogpost about the limitations of models such as theirs.


(Erlend Sogge Heggen) #8

So, will you be running with it? Keep in mind you can soft-disable the JIT notifications by setting a crazy high certainty score so that it’s never triggered. In the near future we will add the ability to disable JIT notifications altogether.


(Chris Beach) #9

I’ll install it on my forum soon. A ./launcher rebuild means downtime for my community so I try to do them infrequently.


(Sam Saffron) #10

I love the approach of being able scan the history of a forum and surface historical issues.

I think we should introduce a new model to store all this information and just have the official plugin backfill the model, then you can run queries about history in data explorer.

I would say a great first move here @erlend_sh with the perspective plugin is focusing on validating this historically.

  • Plugin runs in a “no warnings, no jits, only update model” mode (site setting)

  • Plugin has a job that backfills N posts every N minutes.

  • We then have a few data explorer queries to report on:

    • Most toxic categories
    • Most toxic users
    • Most toxic posts
    • Most toxic posts today

I feel way more easy about enabling this if I could see how it performed historically.

Working through history can allow forum admins to adjust all the params according to historic behavior in the forum.


Discourse Etiquette: take actions when users intend to post inappropriate remark
(Ciler Ay) #11

Hi Chris, Thanks for sharing your experiences with Perspective API.
You might also check out www.smartmoderation.com - curious about your comparison. :slight_smile:


(Mittineague) #12

I was looking forward to trying this out, but I got stuck in a loop trying to get an API key, got frustrated, and gave up trying. I’m not sure what Google wants or doesn’t like about my login values, so I’ll try again after digging around a bit.


(John Li) #13

Hi, I work on Perspective, sorry you’re running into issues.

Did you get approved for the API already (by applying on https://perspectiveapi.com/)? If so, did you have issues following our guide? Feel free to message me directly and I can try to help you out.