* wildcards in Watched Words (Censor) don't work

schungx · December 31, 2017, 6:37am

Repo:

Make sure Site Settings > watched words regular expressions is OFF
Go to Logs > Watched Words
Type in somebadword* in Censor
./launcher restart app (this step appears to be necessary when the watched words list is changed, although I can’t understand why)
Try to type in somebadword in the compose window. No effect.
Check admin/watch_words.json and confirm that the word is accurately entered as somebadword*

https://github.com/discourse/discourse/blob/37854299488e47a7eac818e577c65e9431501b46/app/services/word_watcher.rb#L28-L35

This code should be replacing * with \S* (but I’m not sure if it will get lower-cased in the end). Somehow it is not matching.

schungx · December 31, 2017, 6:39am

Incidentally:

Site Settings > watched words regular expressions is ON
Go to Logs > Watched Words
Type in somebadword\w* in Censor
Confirm that admin/watched_words.json comes back with the correct somebadword\w*
Try in compose window, no effect.

EDIT: It is Censor that is not working.

neil · January 3, 2018, 8:08pm

Works on my machine…

It’s not necessary.

Are you doing this as a moderator or admin? If so, that explains why you aren’t seeing anything.

codinghorror · January 3, 2018, 10:53pm

Yes staff are immune to this by design. So not a bug.

schungx · January 4, 2018, 3:21am

Nope. Already tried it with a normal account. Everything that is wild-carded won’t get masked out.

Incidentally, if a word is included in censored pattern (note: this is in Settings, not in Watched Words), then it gets masked even when I’m admin. But this is beyond this question.

I’m running v1.9.0.beta17 +78. Should I be trying with the latest?

schungx · January 4, 2018, 3:31am

Site setup:

Normal user:

This site is running Beta15. I’ll upgrade to latest and report back.

EDIT: Sorry for not being clear, it is Censor that is not working for me. Not Block or Flag (haven’t tried those).

EDIT 2: OK, Upgraded to latest. Checked. Block and Flag both work fine. Only Censor is not working. I’ve updated the topic title.

thethirdpudding · January 8, 2018, 11:49pm

Hello!

I’m also having trouble using the Watched Words to prevent my community from not respecting TOS over topics and private messages.

Like @schungx, censored pattern from Site Settings is working great, but I’d rather use Watched Words with Required Approval to prevent users from trying to sneak around the regex I’m using.

However, I was only able to trigger the flagging system when creating topics and writing replies. Private messages just won’t trigger anything (Approval, Censor, Flag or Block).
This was tested on v2.0.0.beta1 +26 with 2 test accounts.

codinghorror · January 9, 2018, 2:14am

Can you repro this @neil?

neil · January 9, 2018, 4:51pm

I can only confirm that censoring a word like somebadword* doesn’t work.

Off topic, but word watching works in pm’s for me.

@thethirdpudding Please open a new topic in support to explain what you’re doing.

neil · January 9, 2018, 10:17pm

I fixed this today. Wildcards should work in censored words now.

schungx · January 10, 2018, 4:04am

I’m sorry, but not so fast.

Watched Words Censor now works for words with wildcards.

However, it doesn’t work if watched_words_regular_expression is true.

I don’t think the censor function even considers it as regular expression at all.

Repo:

Settings > watched words regular expression ==> ON
Add xyz* to Censor
In compose window, type xyz123
See that it is censored as it is treating the * as a wildcard. If it is treated as a regular expression, xyz* should only match xy followed by a string of zzzz…

EDIT: The below is the culprit…

https://github.com/discourse/discourse/blob/ad62f1099cdb8782d20ca1296ea73467bc35fce7/app/assets/javascripts/pretty-text/censored-words.js.es6#L11-L13

Notice that it is not even considering that the pattern may already be a regular expression.

Also,

https://github.com/discourse/discourse/blob/ad62f1099cdb8782d20ca1296ea73467bc35fce7/app/assets/javascripts/pretty-text/censored-words.js.es6#L23

This is always assuming that the pattern is a word pattern and \b pairs are auto-wrapped onto it. If the pattern is a regular expression, obviously the \b pairs can be omitted because the user should put them in himself.

Tom_Newsom · January 10, 2018, 9:31am

Wildcards in censor lists can be problematic…

schungx · January 10, 2018, 10:55am

Well, if you put in *shit* then obviously that is what you’ll get… Since you are explicitly asking the system to filter out anything containing these words.

Usually you’ll be using shit* for example…

But of course, it won’t be 100% fool-proof if you use any wildcard. For example:

I love shitaki mushrooms!

neil · January 10, 2018, 3:51pm

@eviltrout is this setting meant to force every word to explicitly include the word boundaries in the patterns? This feature was added for a specific customer, so removing the \b around the patterns could have… surprising consequences!

Agreed. Censor wasn’t updated to support watched_words_regular_expression, so I’ll need to implement it.

eviltrout · January 10, 2018, 4:03pm

Yes this was intentional. If you write the regular expression yourself you can control whether it’s on a boundary or not. Some of the watched words we imported were not for example! It’s a power feature.

neil · January 10, 2018, 7:24pm

@schungx It should work now. Plz update and try again.

schungx · January 11, 2018, 4:40am

Yup. It makes sense to wrap with \b for regular mode since it is simpler and makes sense (at least for English). A small pitfall is that it screws up on non-ASCII letters, but that’s a small issue comparatively speaking.

When a site turns on regular expression, you assume that the admin knows what he/she is doing and write correct regexp’s. Then those \b will be an unnecessary limitation.

Mittineague · January 11, 2018, 5:01am

I’m thinking it might be better to not deal with POSIX regex at all and limit it to PostgresSQL wildcards (_ %)

IMHO, assuming that an Admin that wants regex will know regex will in most cases be quite a leap. Even devs that have advanced programming skills in general can have problems getting regex right.

schungx · January 11, 2018, 5:04am

Not quite.

First of all, I believe the markdown processing in Discourse is actually done via JavaScript so it is natural to use JS regex.

Secondly, there are tons of online tools to check regex’s.

Thirdly, common regex’s are not difficult. The difficult ones are trying to make regex do what it wasn’t meant to do. Most of the normal scenarios are actually quite simple.

schungx · January 11, 2018, 5:38am

Prelim testing shows that it is working perfectly fine! Good job!

Now finally I can censor Chinese!

Topic		Replies	Views
Watched Words - Is there a way to block the use of asterisks with certain words instead of it being used as a wildcard? Support watched-words	12	116	August 16, 2024
Invalid regular expressions in 'Watched Words' makes no watched word work Bug watched-words	2	597	May 31, 2021
Watched words regex: word boundary not working as expected Bug	5	1403	January 25, 2018
How to use Discourse regexes with watched words? Support	8	2339	June 29, 2019
Support for wildcards in word censoring Feature	15	2847	July 9, 2018

* wildcards in Watched Words (Censor) don't work

Related topics