Watched Words Improvement -- similar looking unicode characters

For watched words, I think it could be improved if similar unicode characters also matched.

For example:
abcabcabc
๐˜ข๐˜ฃ๐˜ค๐˜ข๐˜ฃ๐˜ค๐˜ข๐˜ฃ๐˜ค
๐’‚๐’ƒ๐’„๐’‚๐’ƒ๐’„๐’‚๐’ƒ๐’„
ab๐˜ค๐˜ข๐˜ฃ๐’„๐’‚๐’ƒ๐’„

Essentially allows spammers to have a lot of variations of the same words to circumvent the word filter. Iโ€™ve been getting hammered by crafty motivated spammers so theyโ€™ve really been pushing Discourseโ€™s anti-spam features to the absolute limit. This is one of the techniques theyโ€™re using.

Perhaps this could be useful: https://github.com/janlelis/unicode-confusable

2 Me gusta

Thatโ€™s not โ€œfontโ€ that is a different set of unicode characters.

2 Me gusta

Ah my bad, thanks for the correction. Updated the post.

Unlikely, as that kind of unicode โ€œlooks likeโ€ matching is extremely expensive in CPU time and also very finicky to get right, because who decides what โ€œlooks likeโ€ something else? :thinking:

I suggest you should consider other methods of dealing with these spammers.

In the meantime, just add common variations of spam terms as needed in different unicode characters.

3 Me gusta