Watched Words Improvement -- similar looking unicode characters

For watched words, I think it could be improved if similar unicode characters also matched.

For example:
abcabcabc
𝘢𝘣𝘤𝘢𝘣𝘤𝘢𝘣𝘤
𝒂𝒃𝒄𝒂𝒃𝒄𝒂𝒃𝒄
ab𝘤𝘢𝘣𝒄𝒂𝒃𝒄

Essentially allows spammers to have a lot of variations of the same words to circumvent the word filter. I’ve been getting hammered by crafty motivated spammers so they’ve really been pushing Discourse’s anti-spam features to the absolute limit. This is one of the techniques they’re using.

Perhaps this could be useful: https://github.com/janlelis/unicode-confusable

2 curtidas

That’s not “font” that is a different set of unicode characters.

2 curtidas

Ah my bad, thanks for the correction. Updated the post.

Unlikely, as that kind of unicode “looks like” matching is extremely expensive in CPU time and also very finicky to get right, because who decides what “looks like” something else? :thinking:

I suggest you should consider other methods of dealing with these spammers.

In the meantime, just add common variations of spam terms as needed in different unicode characters.

3 curtidas