For watched words, I think it could be improved if similar unicode characters also matched.
For example:
abcabcabc
๐ข๐ฃ๐ค๐ข๐ฃ๐ค๐ข๐ฃ๐ค
๐๐๐๐๐๐๐๐๐
ab๐ค๐ข๐ฃ๐๐๐๐
Essentially allows spammers to have a lot of variations of the same words to circumvent the word filter. Iโve been getting hammered by crafty motivated spammers so theyโve really been pushing Discourseโs anti-spam features to the absolute limit. This is one of the techniques theyโre using.
Unlikely, as that kind of unicode โlooks likeโ matching is extremely expensive in CPU time and also very finicky to get right, because who decides what โlooks likeโ something else?
I suggest you should consider other methods of dealing with these spammers.
In the meantime, just add common variations of spam terms as needed in different unicode characters.