Inappropriate / Obscenity / Profanity Language Filter

(Daniel) #1

Hi there,

Manual moderating (either by assigned moderators or trusted users) is important, but I think some moderation at Discourse can be automated.

For example we can restrict usage of names with swear words, or posting comments or topics that contain swear words. This list of swear words can be editable so that admins can adjust it for specific community at hand.

Posts, users or topics that do not pass the filter can be blocked altogether or marked for moderator’s review, etc.

URL filtering is a built-in feature in all decent forum software, why not have the same for swear words?

What do you guys think?

Blocking email addresses using profanity filter?
(Sjors) #2

I think this is also part of the long term goal the team has set for Discourse, (semi) automatic moderation. They assume a moderator can’t see everything, so moderation should be crowdsourced. Check Jeff’s latest presentation: Forums Are Dead, Long Live Forums

(Daniel) #3

@sjors I have actually watched that video yesterday :slight_smile: On that note, language filter is a step towards (semi)automatic moderation, isn’t it?

(Sam Saffron) #4

Auto flagging based on swear words etc may be appropriate for some forums and totally on the longer term roadmap. I do not think the core team plan to work on it in the near future, but if a developer wants to contribute this we would be happy working with them to specify how it works, etc.

I think there are 2 options you can take.

  1. Total block of certain words … not that effective cause people will simply type fuckk you.
  2. Auto flag posts with certain words in them, should work quite effectively cause it is hidden.

Clearly it gets tricky cause you need a bunch of UIs to manage the lists etc.

(Daniel) #5

@sam Thanks for your feedback! I actually want to contribute to the project, so this topic is more of a feature discussion than a feature request.

(Sam Saffron) #6

Awesome, more than happy to discuss any mocks, workflows, etc.

(Sjors) #7

Yeah it’s a great idea I would also help with it where I can.

(Alex R) #8

Relatedly, would a more general purpose trainable content filtering system be on the roadmap (kinda like reddit’s spam filter or AutoModeratorBot)?

(F. Randall Farmer) #9

Food for thought:
“I want to stick my long necked giraffe up your fluffy white bunny!”

Come to CanCUN To have a good time! Beware the PEN15!

Certainly, suspicious word lists can be used to educate users on the TOS/Community Guidelines for forums, but they are not appropriate for “auto-moderation.”

If you think an algorithm detects a suspicious phrase - why not detect it during the post composition phase and correct the problem then (“Hey you, we don’t use language like that around here! Please rephrase.”) and if the detection algorithm is ambiguous, you can add “If you think this warning is incorrect, go ahead a post, but you’ve been warned!”

The purpose should be to educate the community as to standards, not pretend you can stamp out bad words.

(Alex R) #10

I like it!

Train on posts flagged as “Inappropriate content” or something like that (negative data points by how many people scroll past it before it gets flagged, if ever), warn a user if their post is predicted to get flagged as inappropriate content, punish people who repeatedly post content which is later flagged as inappropriate despite being warned (the punishment for being flagged may suffice, but I think it would be useful to keep track of people who blatantly disregard standards vs. people who just run into them often).

Experiment with what features to use to identify bad content; different algorithms may suit different forums better.

Possibly, notify mods if someone with a history of inappropriate content chooses to post against the filter’s judgement.

(Luke S) #11

And if you are having problems with tons of people ignoring this warning, or intentionally dumping junk on a forum anyway, could you have the system automatically add, say, half of the flag points necessary to hide a post?

This would:

  • Make community based moderation quicker and easier for really objectionable content.
  • Acknowledge the fact that automatic algorithms, especially fuzzier ones, can be wrong.

(F. Randall Farmer) #12

Individuals who repeatedly violate these warnings are dealt with the same way as other violations (off topic, illegal, etc.): flags, hidden content, loss of trust, bans.

Perhaps you’d get the effect you want if, instead, you just require trust level 1 to bypass the warning. Then someone would have to go out of their way to post a substitute bad word (which would get by the filter/warning.) If it’s still a problem, the normal flagging mechanism would handle it.

Again - the important thing here is that text-filters can’t prevent intentional abuse - we are trying to train users to follow the rules. These are very different goals.

Here’s my problem with any post-with-auto-flag system: Bad guys circumvent it trivially and good guys get caught in the net unintentionally/innocently.

Why implement a bunch of additional software that everyone knows how to bypass? Focus on improving post quality over garbage collection.

(Luke S) #13

Good points. I especially like:[quote=“frandallfarmer, post:12, topic:7993”]
just require trust level 1 to bypass the warning.

(Jeff Atwood) #14

Fuckk me? No…

I think a list of “trigger words” (or regex strings) that generate a flag might be interesting for a variety of reasons.

(Jeff Atwood) #15

I wonder if it’s better to simply warn immediately vs. block the post?

So, let’s say the user types a defined naughty word in a post, a word we as a community agree is unacceptable here. What’s the correct course of action:

  • silently let it through and auto-flag the post for moderator attention?

  • when the user attempts to post, warn with a dialog that says exactly what word isn’t allowed, but allow them to post if they really feel they must post as-is, consequences be damned.

  • when the user attempts to post, block with a dialog that says exactly what word isn’t allowed, and keep blocking them until they remove or change that word.

  • when the user posts, replace that word with another word, a censored version, or nothing?

Also: should a high enough trust level make you immune to this check?

(Luke Larris) #16

I think a combination of option 1 + option 2 would be the best thing, although all those options would be handy for some use cases.

Most forum software simply replace the words with other things, like asterisks. I particularly liked editing those filters to replace “fuck” with “duck” or “shit” with “holy explosive molten diarrhea”, but I digress.

I think autoflagging the post and warning them would be the best setup.

(Dave McClure) #17

I vote for the combination of these two:

(Jeff Atwood) #18

I am kind of down on auto-flag as I think it will create a bunch of extra ongoing work for moderators, and I’d rather not do that.

If you believe, as I do, that any objectionable language filter just makes people switch to alternative ways of offending people, then personally I’d vote for either

  • outright warn + block
  • auto-switching the content to censored version, e.g. f**k

None of this would be on by default, so unlikely to affect the average Discourse install.


I’d vote to let the community flag it. But I probably wouldn’t be part of a community that could agree on a list like that, so as long as the list is community generated and empty by default…

I definitely believe that.

(Joel Bennett) #20

Yeah, I agree. What you really want is to modify behavior, not spend your life updating regex to catch the next way to spell fck – so warn them with a polite message, and optionally flag the post for moderator review.

I definitely see the point about moderation overload, but I’m sure there are going to be SOME installs of discourse by schools and churches etc. where they really want an option to completely block out the worst words. I’ve moderated forums in the past where we went as far as to replace words with antonyms, like hug, love, sugar, and such. That sort of thing always ends up with bans on users who won’t abide by the rules, so flagging the post or even the account would be worth it for them.

For what it’s worth, for a warning to work I suspect the wording of the warning is crucial and culture-dependent. The point is a reminder that we strive to maintain a courteous and kind atmosphere and a request to rephrase.

It’s like I tell my kids: I’m not trying to get you to say “heck” instead of “hell” and “crap” instead of “shit” … I’m trying to teach you that you can express yourself – and even do so vehemently-- without resorting to expletives. :wink: