When using Watched Words, accented characters can cause false positives by splitting a word on the accented character rather than treating it as part of the word. It seems that the word filter treats letters with accents and diacritics as blank spaces instead of part of the same word.
Repro steps:
Add ‘anal’ to blocked Watched Words
As non-admin user, attempt to use analógico
in a post
Post is blocked
Attempting the same with analog
works as intended, and is allowed to be posted.
9 curtidas
nizar9
Abril 24, 2023, 7:43pm
3
I was able to reproduce the same thing on my end. This bug also includes other characters with a cedilla like ç and ş:
3 curtidas
nbianca
(Bianca)
Maio 18, 2023, 3:06pm
10
Support for UTF-8 characters in watched words has been implemented in this PR:
discourse:main
← discourse:fix_utf8
opened 07:17PM - 02 May 23 UTC
Watched words were converted to regular expressions containing \W, which handled… only ASCII characters. Using [^[:word]] instead ensures that UTF-8 characters are also handled correctly.
This should correctly detect word boundaries for all words, including those that contain UTF-8 characters.
3 curtidas
nbianca
(Bianca)
Fechado
Maio 22, 2023, 5:00am
11
This topic was automatically closed after 3 days. New replies are no longer allowed.