Evaluating Google's Perspective API on your Discourse forum

ChrisBeach · February 7, 2018, 1:32am

Continuing the discussion from Discourse Etiquette: take actions when users intend to post inappropriate remark:

Google’s Perspective API is an experimental machine learning project, trained by moderation data from New York Times.

It’s designed to analyse online comments and highlight anything toxic.

I built an app that loads all posts from a Discourse forum backup and processes them using the Perspective API (instructions in README.md):

https://github.com/chrisbeach/discourse-civility

The results are stored in a DB for analysis.

When run on my forum, the “most toxic posters” list corresponded well with members our moderators are keeping an eye on. Aside from “toxicity” the measures of “inflammatory”, “attack on author”, “attack on commenter” were also reasonably accurate - at least enough to highlight the most problematic behaviour on the forum.

There were one or two anomalies. For example, one of our “most toxic” members turned out to be a wonderfully kind chap who was one of our forum’s earliest joiners and freely donated money to help with our running costs. Why so toxic? It turns out that he signs off posts with his nickname, “Dick,” and as far as the Perspective API is concerned, that’s rather uncivil.

Please give my app a try on your forum and let me know how you get on. I’d be grateful for any PRs and/or feedback on the code as I’m new to Postgres, Akka-HTTP, Doobie, Docker and Spray-JSON. Cheers!

riking · February 7, 2018, 7:00am

That’s a fairly major false positive there!

Mittineague · February 7, 2018, 7:15am

Is the word list biased towards American cultural slang? It is not uncommon that a word that is perfectly innocent in one country has a not-so-innocent connotation in another.

Then there’s context too. “Dick” as a given name vs. as an adjective is a perfect example of this.

There’s also “privilege”. Meaning for example, if a gay wants to identify themselves as a gay, that would be self-descriptive not derogatory.

That said, as long as it’s more of a “this should be reviewed” and not a “auto-ban” thing it sounds promising.

mpalmer · February 7, 2018, 7:24am

Given that the system was

I’d say it’s extremely likely that it is very biased towards both American idioms (and spelling), and also the kinds of things you get in an online news comments section, as opposed to any other form of online communication.

ChrisBeach · February 7, 2018, 8:50am

Yes it is.

I used the app to help decide whether installing the discourse-etiquette plugin was a good idea yet. I concluded that the flagging feature of the plugin would probably be useful to mods, but the automated feedback to users might be risky.

Here are the stats by category on my forum - seems mods urgently need to check out “2016 Parents” (an opt-in category) and work out what’s going wrong there!

As I imagined, most of the toxicity happens in our ethics and politics categories.

Here are the overall scores for SE23.life


Attack on Author	0.1
Attack on Commenter	0.19
Incoherent	0.505
Inflammatory	0.175
Likely to Reject	0.412
Obscene	0.165
Severe Toxicity	0.045
Spam	0.382
Toxicity	0.116
Unsubstantial	0.41

I’d be interested to hear how other forums compare

merefield · February 7, 2018, 9:12am

Fantastically interesting subject, thank you for the hard work on the app.

I think this is surely one of the biggest issues in social networking.

Being from the UK I’ve seen a tremendously poor level of debate around Brexit. We need to encourage people to debate with civility and not constantly resort to personal attacks. People need to focus on the subject of debate and stop being cheap.

I wonder if this is something that could lead to ‘self reflection’ if available online within Discourse:

in your own profile or
as part of the submission preview or
by a flag next to your post viewable by just you or a moderator …

all as a means of judging your own ‘toxicity’ metric, to give you a pre-warning of you becoming too heated in your debating style? Even as a moderator/admin, I find myself going back to edit posts to reduce the temperature of some of my statements. As moderator/admin you clearly have a responsibility and need to set a good example so this is even more critical (though of course I think one really knows when one’s truly crossed the line)

erlend_sh · February 7, 2018, 9:40am

Haha, wonderful find! And great work all around. I’ve already forwarded this to the Google team, who can be emailed at conversationai-questions@google.com.

For those interested in reading more about Perspective’s models and biases, here’s some recommended reading:

https://medium.com/the-false-positive

https://blog.coralproject.net/toxic-avenging/

So far, Perspective API has received a mixed response. The first release, while intended to be an early test version of their approach, seemed to have several serious deficits. Experiments by the interaction designer Caroline Sinders (who has also done work with us) suggested that it was missing some key areas of focus, while Violet Blue, writing for Engadget, used Jessamyn West’s investigation among others to show that the system was returning some truly troubling false positives. For their part, Jigsaw says it is aware of these issues, and recently wrote a blogpost about the limitations of models such as theirs.

erlend_sh · February 7, 2018, 10:13am

So, will you be running with it? Keep in mind you can soft-disable the JIT notifications by setting a crazy high certainty score so that it’s never triggered. In the near future we will add the ability to disable JIT notifications altogether.

ChrisBeach · February 7, 2018, 10:15am

I’ll install it on my forum soon. A ./launcher rebuild means downtime for my community so I try to do them infrequently.

sam · February 7, 2018, 12:35pm

I love the approach of being able scan the history of a forum and surface historical issues.

I think we should introduce a new model to store all this information and just have the official plugin backfill the model, then you can run queries about history in data explorer.

I would say a great first move here @erlend_sh with the perspective plugin is focusing on validating this historically.

Plugin runs in a “no warnings, no jits, only update model” mode (site setting)
Plugin has a job that backfills N posts every N minutes.
We then have a few data explorer queries to report on:
- Most toxic categories
- Most toxic users
- Most toxic posts
- Most toxic posts today

I feel way more easy about enabling this if I could see how it performed historically.

Working through history can allow forum admins to adjust all the params according to historic behavior in the forum.

MeMuted · February 8, 2018, 11:56am

Hi Chris, Thanks for sharing your experiences with Perspective API.
You might also check out www.smartmoderation.com - curious about your comparison.

Mittineague · February 8, 2018, 5:10pm

I was looking forward to trying this out, but I got stuck in a loop trying to get an API key, got frustrated, and gave up trying. I’m not sure what Google wants or doesn’t like about my login values, so I’ll try again after digging around a bit.

jetpack · February 12, 2018, 5:40pm

Hi, I work on Perspective, sorry you’re running into issues.

Did you get approved for the API already (by applying on https://perspectiveapi.com/)? If so, did you have issues following our guide? Feel free to message me directly and I can try to help you out.

Topic		Replies	Views
Discourse Google Perspective API Plugin official , perspective-api	2	20870	August 10, 2024
Pre-emptively warning a contributor about the toxicity of their post Feature	19	3429	September 27, 2017
Hiding "toxic" messages using the Google Perspective API? General	10	2638	June 13, 2017
Community Sentiment Report (negative/neutral/positive) Feature	5	2292	June 29, 2019
Have AI check for inappropriate post or at least words and flag the post Support ai , ai-toxicity	3	379	July 7, 2023

Evaluating Google's Perspective API on your Discourse forum

Related topics