使用脏话过滤器阻止电子邮件地址?

Continuing the discussion from Inappropriate / Obscenity / Profanity Language Filter:

So the profanity filter works well … not that we see it used often.

But is there a way to add regex/patterns so it could block people from putting an email address in a post?

That would be extremely dangerous, though.

“Extremely dangerous” is a strong phrase. Can you say more about why you feel that way?

http://blog.codinghorror.com/obscenity-filters-bad-idea-or-incredibly-intercoursing-bad-idea/

Etc etc etc

Regex is like a language unto itself. Even a lot of seasoned programmers have trouble with it.

Using it requires not only understanding every possible variation you want to match, but also every possible variation you want to not match.

A lot of people do fairy well with the first, but fail with the second.

For example, using (.)* matches everything, anything, and nothing.
I see it used way too often as a “short cut” to get things to match, but unfortunately it often results in matching what it shouldn’t.

I guess if it were under the “developers only” section it might be enough to scare off Admins that shouldn’t mess with it. But human nature being what it is, give out loaded guns and it’s only a matter of time before someone shoots themselves in their foot.

And as for a valid email regex, it is notoriously difficult to craft a fool-proof one. Many come close and are “good enough” but without additional processing there will likely be problems at some point.

Fair enough, but the filter already exists in Discourse, even if you aren’t using it personally. Also, keep in mind (in response to your blog post) that the Discourse filter doesn’t replace strings, it masks them with squares. :slightly_smiling:

Regexes are inherently difficult, so I’m not necessarily proposing that you ask everyday users to use them as the mainstream use case. The current system works fine for most cases, but there’s no way to surefire way prevent people from posting most common email addresses. (I am not interested in the debate on the “perfect” email regex.)

Meanwhile, I’m simply blocking some of the most common domain names like @gmail.com, @yahoo.com, etc.

I am not interested in letting perfect get in the way of good here. Just trying to prevent the most common occurrences.

Actually, last I knew it replaces the characters with the box decimal value

eg. blocking “@gmail.com”, “someone@gmail.com” would look like

someone■■■■■■■■■■

and the source would be

someone■■■■■■■■■■ 

That is what I meant when I finished the sentence with:

Forgive my error of specificity. What I meant was that it doesn’t replace it with other letters to change the word, as described in the blog post above. :wink:

你好,

我想防止用户在社区公开讨论中分享电子邮件地址,以保护他们的隐私(也许尽管我们已尽力,仍有人没有意识到这些讨论是公开的?)。

以下方案是否合适?或者是否存在我尚未意识到的重大风险?

*@*.com
*@*.org
*@*.net
*@*.edu
*@*.info
*@*.biz

所有正则表达式都伴随着巨大风险,范围越广,风险越大。这些正则表达式……相当危险。

我原本天真地希望,使用"@"符号并包含顶级域名,就能将范围缩小到仅限电子邮件地址。难道没有方法可以针对这些进行筛选吗?

*.?@gmail.com 这样的模式会安全得多。理想情况下,我建议只允许单词字符,而不允许使用星号(即所有字符)。