Support for wildcards in word censoring


(se oli tonnin seteli) #1

I don’t know if anyone actually uses this for other than fun but due to inflection it would be nice to be able to use wildcards when adding censored words.


Could someone tell me about Discourse regexes?
(Michael Downey) #2

I could see this leading to too many false positives but it could be nice in some cases.


(Kane York) #3

This was intentionally not supported when the feature was added to avoid people making poor ■■■umptions, or talking about a topic’s ■■■le, or the town of S■■■■horpe, or medireview weapons.


(Michael Downey) #4

Wouldn’t happen if someone was thorough with their regex though :slight_smile:


(Jeff Atwood) #5

The potential for disaster is just too high. People aren’t “good” at regex.


(Mittineague) #6

Agreed 300+%

If you’re working with a limited set of possible variations, say,file or Category names, or maybe even Discourse table names, then coming up with a working regex is possible.

But if you’re talking about anything that might possibly be entered by anyone into a post, your working regex is much simpler.

It’s
/(.)*/
:wink:


(Neil Lalonde) #7

Hey look what we added this week:

This is a new setting that works alongside censored words, but is a regular expression. The above example filters 7 digits of a phone number. If you like it, call me at 555-1234!

The value entered in the setting is checked to see if it matches too greedily like (.)*. Also, if cooking a post ever fails because the javascript engine timed out, then the censored pattern setting will be cleared on the assumption that it’s an evil regex like (a|aa)+.


Censor specific text patterns
(boxels) #8

Anyone have any guidance on newbies using regex patterns? Can you do more than one? does a list like this work, if so how do we enter it to make each word work?

http://pastebin.com/tcNWpHjg


(Mittineague) #9

Yes, if you don’t know what you’re doing, don’t do it on a live site.

I find that https://regex101.com can be good for testing. I don’t know if the input takes an array of patterns. But 304 of them would make for one very long pattern which the input may not take either. For that many, I have a feeling a plugin might be best.


(Rafael dos Santos Silva) #10

That list would go into the censored words list and not on the censored pattern.


(Kane York) #11

Also, it’s probably a bad idea to censor the word “admin” @boxels. That probably goes in “disallowed usernames” instead (where it already is).


(richard morris) #12

It can be defeated by typing in a< i>< /i>ss


(Jeff Atwood) #13

Well yes there are dozens of ways to defeat word censoring. That is not the point of it though.


(Jay Pfaffman) #14

Can there be more than one regex? It doesn’t seem to let you have multiples like permalink normalizations, for example.

I’ve a client who’d like phone numbers and emails hidden.

Edit: No, but you don’t need that, just use a |.

\d{3}-\d{4}|[\w+\-.]+@[a-z\d\-]+(\.[a-z\d\-]+)*\.[a-z]+ will censor email addresses and xxx-yyyy phone numbers.


(Dean Taylor) #15

It will also sensor text after an email address on the same line:

EDIT: I see an edit in there now :slight_smile: