Watched Words Improvement -- similar looking unicode characters

markersocial · Agosto 5, 2019, 3:00am

For watched words, I think it could be improved if similar unicode characters also matched.

For example:
abcabcabc
𝘢𝘣𝘤𝘢𝘣𝘤𝘢𝘣𝘤
𝒂𝒃𝒄𝒂𝒃𝒄𝒂𝒃𝒄
ab𝘤𝘢𝘣𝒄𝒂𝒃𝒄

Essentially allows spammers to have a lot of variations of the same words to circumvent the word filter. I’ve been getting hammered by crafty motivated spammers so they’ve really been pushing Discourse’s anti-spam features to the absolute limit. This is one of the techniques they’re using.

Perhaps this could be useful: GitHub - janlelis/unicode-confusable: Unicode::Confusable.confusable? "ℜսᖯʏ", "Ruby"

codinghorror · Agosto 5, 2019, 3:07am

That’s not “font” that is a different set of unicode characters.

markersocial · Agosto 5, 2019, 3:17am

Ah my bad, thanks for the correction. Updated the post.

codinghorror · Agosto 5, 2019, 4:00am

Unlikely, as that kind of unicode “looks like” matching is extremely expensive in CPU time and also very finicky to get right, because who decides what “looks like” something else?

I suggest you should consider other methods of dealing with these spammers.

In the meantime, just add common variations of spam terms as needed in different unicode characters.

Tópico		Respostas	Visualizações
Bypassing watched words with confusable character replacements Support watched-words	2	163	17 de Dezembro de 2024
Russian characters in Watched Words list are failing to be properly identified Bug watched-words	1	528	10 de Fevereiro de 2021
Watched words to block recent spam attack Sysadmins spam , watched-words	12	264	10 de Setembro de 2025
Hope Watched words adds support for non-English characters Bug	2	47	20 de Fevereiro de 2026
Accented characters cause false postives in Watched Words Bug watched-words	3	450	22 de Maio de 2023

Watched Words Improvement -- similar looking unicode characters

Tópicos relacionados