When using Watched Words, accented characters can cause false positives by splitting a word on the accented character rather than treating it as part of the word. It seems that the word filter treats letters with accents and diacritics as blank spaces instead of part of the same word.
Repro steps:
Add ‘anal’ to blocked Watched Words
As non-admin user, attempt to use analógico
in a post
Post is blocked
Attempting the same with analog
works as intended, and is allowed to be posted.
9 „Gefällt mir“
nizar9
24. April 2023 um 19:43
3
I was able to reproduce the same thing on my end. This bug also includes other characters with a cedilla like ç and ş:
3 „Gefällt mir“
nbianca
(Bianca)
18. Mai 2023 um 15:06
10
Support for UTF-8 characters in watched words has been implemented in this PR:
discourse:main
← discourse:fix_utf8
opened 07:17PM - 02 May 23 UTC
Watched words were converted to regular expressions containing \W, which handled… only ASCII characters. Using [^[:word]] instead ensures that UTF-8 characters are also handled correctly.
This should correctly detect word boundaries for all words, including those that contain UTF-8 characters.
3 „Gefällt mir“
nbianca
(Bianca)
Geschlossen,
22. Mai 2023 um 05:00
11
This topic was automatically closed after 3 days. New replies are no longer allowed.