Specific Censored/Blocked words are not being restricted properly


#1

The following words have been added to the Censor list, but aren’t being censored when posted in a topic:

  • $4$
  • @$$

The following word has been added to the Block list, but is allowed to be posted in a topic:

  • P.R.

(Neil Lalonde) #2

Did you enable regexes there? If so, you need to escape the special chars.


#3

Nope, we are not using regex for watched words, the box is unchecked.


#4

Hello again Neil,

I was wondering if there was an update on this ticket? Or if you need any more information from me?


(Jeff Atwood) #5

Can you repro this @jomaxro


(Joshua Rosenfeld) #7

Yes, I can repro this on try. Adding the 2 words to the censor list, 1 word to the block list, checking that watched words regular expressions is disabled, and then posting as a non-staff user worked.


(Jay Pfaffman) #9

Where is the censored list? I’ve looked everywhere that I can think of and still can’t find it.


(Joshua Rosenfeld) #10

Admin > Logs > Watched Words


(Jay Pfaffman) #11

I know that there has been lots of discussion about this (that I thought I was following), but having this on “logs” is not very intuitive.


(Karl Romanowski) #12

On mobile you can only see the block list. On desktop you can see block, censor, flag, and needs approval lists.


(Neil Lalonde) #13

Thanks @Karl_Romanowski. I added the same button that the site settings UI has so the other words lists can be accessed on mobile.

As for $4$, @$$, and P.R., those are words that contain word boundaries, so aren’t being detected as complete words. We look for word boundaries by default to handle words next to punctuation, in parentheses, etc. “Words” like co(onut can be matched because the boundaries are inside the word (and we don’t look for them inside), but your examples have them at the end(s). You can enable the watched words regular expressions setting to have full control over how your words are matched.


(Jeff Atwood) #14

Could we somehow improve the help text to clarify this misunderstanding in the future?


(Stephen Chung) #15

I vaguely remember that,internally, a regexp is used to match censored words by wrapping in a pair of \b’s.

Which means that characters that happen to have regexp meaning may be interpreted as regexp.


(Neil Lalonde) #16

The characters with special meaning in regexp are being escaped, so that’s not the issue. It’s that the periods and dollar signs in P.R. and $4$ have word boundaries around them.

I’m not sure how to express that… We could test the “word” entered against itself, and warn if it doesn’t match. I looked for something we can automatically add to the word so that it works, but couldn’t find anything. I don’t understand why the regexp doesn’t work… If there are word boundaries inside the word, it’s fine. But if the first or last char has word boundaries around it, it’s a problem.

\b(P\.R)\b matches P.R fine.
but
\b(P\.R\.)\b doesn’t match P.R.

I suspect there’s something we can do to make this work…


(Stephen Chung) #17

A dot is not a word character, so if followed by white space, will not match \b which requires word boundary, meaning either character on one side must be a word char and the other a npn word char.

Word char is defined in regexp as ASCII letters.

And unfortunately no… There isn’t much we can do to make it work.