Official Google Perspective API Plugin for Discourse
What is the Perspective API?
From the official site, “Perspective is an API that makes it easier to host better conversations.The API uses machine learning models to score the perceived impact a comment might have on a conversation. This model was trained by asking people to rate internet comments on a scale from very toxic to very healthy contribution. Toxic is defined as… a rude, disrespectful, or unreasonable comment that is likely to make you leave a discussion.”
What can the discourse-perspective-api plugin do?
- Prompt users if they are sure about submitting a potentially toxic post, before submit.
- Automatically flag toxic posts for moderators and admins to review.
- Optionally scan private categories and PM’s for toxic content
Follow the instructions at Install a Plugin using https://github.com/discourse/discourse-perspective-api.git as the repository URL.
Where do I get a Perspective API key?
Head over to https://www.perspectiveapi.com/ and click the button. Fill up the form with your details and wait. Google can take anywhere between a few hours or 1-2 days to send you an API key as they are distributed on a rolling basis. The API can be used free of cost, here are the API Reference docs.
Site Settings Walkthrough
(Admin -> Type ‘perspective’ in the Filter text field)
The API is currently only available for the English language.
The default thresholds are set to be reasonably high but these settings offer some customizability for fine-tuning how this plugin works. Play around with the live demo on the official docs linked above to get a sense of how the thresholds will behave.
Enable the plugin for filtering potentially toxic posts.
Choose toxicity model for Google’s Perspective API. Read more about how these models are developed by reading the API Reference docs.
classfies rude, disrespectful, or unreasonable comments that are likely to make people leave a discussion. It is easier to cross the threshold on the standard model if curse words and insults are used in a friendly way and posts are flagged easily. if you choose a high threshold of 0.9, the standard model will flag lesser posts and will take lesser incorrect actions.
severe toxicity (experimental)
This model uses the same algorithm as the standard model, but is trained to recognise examples that were considered to be ‘very toxic’. This makes it much less sensitive to comments that include positive uses of curse-words for example. Posts are flagged only when extreme cases of toxicity are detected and the threshold for this model can be lowered till 0.7 as a reasonable value.
For example, a post containing
"I f*****g love you man"would get flagged under the standard model (using the default thresholds) but not with the severe toxicity model.
Enable the checking of potentially toxic content while a user is trying to submit a post and push a notification in the composer when a user writes something toxic.
If API returns a score higher than this threshold, we notify the ask the user if they are sure they want to post potentially toxic content. The confidence level of post toxicity between 0 and 1 that is used to check toxicity while a user is composing a post where a score of 1 means extremely toxic. A value above 0.9 should flag highly toxic posts only, depending on the model used. As the user will be notified before posting, we can use a slightly lower threshold here like 0.85 to warn users beforehand.
Flag possible toxic posts that have already been submitted and send messages to notify moderators for posts that have been submitted. Admins/Moderators are notified about the flagged posts.
If the API returns a score higher than this threshold, we flag the post for admins/moderators to review. The confidence level of post toxicity between 0 and 1 that is used to check toxicity after a user has posted where a score of 1 means extremely toxic. A value above 0.9 should flag highly toxic posts only, varying on the model used.
API key for the Perspective API that you have received after completing the registration process mentioned above.
Check and flag private messages if toxic.
Note: The content of the PM will be sent to moderators/admins.
Also applies to backfill mode.
Additionally check private categories for toxic content by enabling this setting.
Query toxicity for existing posts and record the results in post custom fields.
Enabling this mode disables online checking for posts.
The period in days to start a new query iteration after finishing last iteration. Used only if
What a user sees when trying to submit a toxic post:
What admins/moderators see when a toxic post is submitted:
bundle exec rake plugin:spec["discourse-perspective-api"] in your discourse root folder.
Big thanks to @fantasticfears for creating this plugin!