Watched Words Improvement -- similar looking unicode characters

For watched words, I think it could be improved if similar unicode characters also matched.

For example:
abcabcabc
𝘒𝘣𝘀𝘒𝘣𝘀𝘒𝘣𝘀
𝒂𝒃𝒄𝒂𝒃𝒄𝒂𝒃𝒄
abπ˜€π˜’π˜£π’„π’‚π’ƒπ’„

Essentially allows spammers to have a lot of variations of the same words to circumvent the word filter. I’ve been getting hammered by crafty motivated spammers so they’ve really been pushing Discourse’s anti-spam features to the absolute limit. This is one of the techniques they’re using.

Perhaps this could be useful: GitHub - janlelis/unicode-confusable: Unicode::Confusable.confusable? "β„œΥ½α–―Κ", "Ruby"

1 Like

That’s not β€œfont” that is a different set of unicode characters.

2 Likes

Ah my bad, thanks for the correction. Updated the post.

Unlikely, as that kind of unicode β€œlooks like” matching is extremely expensive in CPU time and also very finicky to get right, because who decides what β€œlooks like” something else? :thinking:

I suggest you should consider other methods of dealing with these spammers.

In the meantime, just add common variations of spam terms as needed in different unicode characters.

3 Likes