Discourse Etiquette: take actions when users intend to post inappropriate remark

official

(Jeff Atwood) #42

I would like us to test this here on meta (as I think it still needs a fair bit of work), but other stuff keeps pre-empting it.


(Sam Saffron) #43

Note :mega:

This plugin is now #official ! Thanks for all your help @fantasticfears!


(Nichalas Petranek) #44

@sam Is this plugin available in the current beta release or will it not be available until the next beta release?


(Sam Saffron) #45

It is official and released, feel free to use the link in the OP


(Nichalas Petranek) #46

I’m going to blame a lapse in memory for my above comment. I completely forgot that I can just install the plugin on my own. I was on a line of thinking that it is going to ship with core and a possibility of waiting for it…


(Sam Saffron) #47

There are a handful of plugins we bundle, but vast majority of official plugins have their own homes.


(Danny Goodall) #48

@fantasticfears, great work on the plugin!

Can I ask if there is logic in the plugin to stop/pause issuing warnings to a poster once an initial warning has been given but ignored?

Detail

I run a small UK football team forum so the language used is often very industrial and emotions often run high - especially when a team has lost and someone is posting from the pub!

We’re not looking to stamp out profanity as, for right or wrong, that is an integral part of supporting a UK football team

I am, however, interested to see if the warnings may moderate these extreme emotional outbursts.

Anyway, I’ve created a test thread and have asked some users to experiment and see what gets flagged and what doesn’t.

What we’re seeing is that the first post with profanity / aggression / toxicity is warned but then subsequent posts in the same thread with similar levels of toxicity are ignored.

The behaviour looks like the warnings are given (but then the user chooses to ignore them), and then a time period is allowed to pass before more warnings are then given - despite toxic language having been used between those warnings.

I’ve checked the logs and haven’t seen anything that suggests we’re tripping over rate limits - as was discussed higher up the thread.

Any advice welcome.


(Erick Guan) #49

Thanks for your detailed description. Might be interesting to see such a thread for me to assess the situation.

Firstly, I’ll give some context. Providing the document from Google, this AI model was trained on US corpus which might contribute to unable to identify toxic post. That’s a challenge for computer to actual understand complicated human profanity in this research field. After all, English people are polite in language :stuck_out_tongue:

I tried a few keywords, it’s usually the cursing words which gets the warning easily. Beyond that, I don’t think Google’s AI is intelligent enough for catching all of toxic content which we can’t assume it were. For the plugin, it doesn’t take the context into consideration. It scans every posts people submit every time. It works like different people reading one post at a time instead of a single people reading all content.


(Danny Goodall) #50

Thanks for the reply @fantasticfears.

We’ve turned the plugin off for the time being as the identification of toxic content seemed to be unpredictable - even for the same text posted by the same poster.

For example, someone posted “F*** off you stupid c***” (it was a test thread and I did ask them to be creative to see what text garnered a warning!).

As expected, they got a warning. However, if they then tried to post the same text a little later no warning was given. A little later still and the warning returned for the same text.

So, if, as you’ve described, every post is ‘read’ and run through the trained set in isolation and context isn’t considered, then I’d have expected the plugin to warn the user every time they posted the same, toxic test.

This wasn’t happening.

If I get the time I’ll turn it back on and get you some more concrete conditions for reproducing what I was seeing.

Keep up the good work!


(Jeff Atwood) #51

@sam will be renaming the plugin and restarting this topic soon.


(Erick Guan) #53

The warning was showed once but you can submit it anyway. Though the fundamental difficulty would be not understanding the content which is hard to resolve by all means.


(Danny Goodall) #54

So, just so I understand the logic here Erick, are you saying that if the poster dismisses the warning then there is logic in the plugin that will not warn them again for that topic / that toxic text / or for a specific time period?

Because that was the behaviour we were observing.

Let me use an example which might be clearer.

If I post the following toxic text

you f* stupid c*

and Etiquette determines that this is toxic and I am given a warning but I chose to ignore the warning, and post the text anyway and I then re-post the same text

you f* stupid c*

Would you expect me to see another warning straight away?

Because we did not receive a second warning for the same text. We were able to post the same toxic text without warning. A little later the warnings seem to re-appear.

There was also nothing in the log to suggest we had tripped a rate limit or that something had gone wrong contacting Google.

As I say, when I have time, I will re-enable the plugin and take some real-time screenshots to illustrate the problem we saw.


(Erick Guan) #55

It won’t warn again for the same editor including whether it’s put in draft or not. It doesn’t have any logic for same topic or time period.

In this case, it will not warn again because the warning is given. But it will warn the user if the content was changed.


(Ultra Noob) #56

This is interesting. However, for the privacy I feel Discourse inbuilt option ‘Require approval’ or ‘Flag’ is good.
I would be happy to see similar to my first option which can work without sending data to 3rd party.