`levenshtein distance spammer emails` should flag accounts that are similar even if no accounts have been marked as spammers yet

codinghorror · September 9, 2016, 3:18am

I would definitely train your moderators to “delete as spammer” any spam accounts immediately, there’s a bunch of stuff that just can’t happen if that doesn’t happen:

none of the IPs are blacklisted
none of the emails are blacklisted (or tested for levenshtein distance)
none of the post content is sent to Akismet for bayesian spam inference

Also, your idea that “hey just look for similar emails at signup, it’s easy!” doesn’t actually help in the cases I’ve seen; it’s quite rare for spammers to use nearly-identical emails, they usually generate 12-16 character random emails. I also simply don’t agree that we should check other new accounts, all of this is predicated on spammers being deleted and correctly tagged as spammers first.

(It is interesting @sam that these bamwar-esque koreans did not fall afoul of the fast new user typist check though. I also think Stonehearth might be missing the “korean characters are an immediate warning flag” check? They also clearly came from a variety of different IP addresses, otherwise they would have hit the no-more-than-3-new-users-from-the-same-IP check.)

codinghorror · September 9, 2016, 3:27am

Stated another way, here’s all the checks these spammers passed:

Akismet explicitly said “not spam” on each and every post they made
They did not come from the same IP address, or they would have run into our default “no more than 3 new users from the same IP” block
They did not hit the “super fast typist” new user check

Failing to delete them as spammers also means:

we did not add them to the IP block list
we did not add their email to the email block list
we did not test their emails against recent spam block emails
Akismet did not get the spam content to add to its bayesian inference engine as a spam signal

Other than the korean character check, not sure there’s a whole lot else we could really do here. It is critical that staff understands to Delete as spammer as quickly as possible though. Until the user is unequivocally marked as “YES THIS IS A SPAMMER” our hands are a bit tied. I am not down with randomly comparing emails of new users, until those users have been confirmed as spammers.

jomaxro · September 9, 2016, 3:36am

Thanks for the detailed look into what happened. I will inform staff of the importance of using the “delete as spammer” option as quickly as possible. To check, does deleting a user from the Admin user page (and blocking IP/email) do the same thing? Looking through the logs it seems each moderator deleted people differently.

Understanding all the checks they managed to pass through, I am curious why you are not down with checking emails on registration? Checking emails against the last 100 registered emails shouldn’t be a performance issue, doesn’t seem to be a privacy issue (the check is by the system, not a human looking at an email), and won’t effect 99% of people. I guess what I am asking is what harm would it do to check emails before spammers are confirmed?

cpradio · September 9, 2016, 10:25am

You are presented with an option to Delete and Block or simply Delete. (just like using the Delete button from the User’s Public Profile

jomaxro · September 9, 2016, 11:53am

Right, I see the option to delete an block in all 3 spots. I just wanted to verify that it didn’t matter where the blocking was done from, as the delete and block from the spam flag queue explicitly says “Delete Spammer”.

cpradio · September 9, 2016, 12:43pm

Yes, it doesn’t matter. So long as you choose the correct one and do not choose “Delete Only”

codinghorror · September 30, 2016, 11:12pm

We figured out the bamwar folks were using edits in the grace period to get around some of the spam checks. @sam added a new site setting as a temporary fix to prevent TL0 users from editing their posts, but we need a deeper fix so edits in the grace period go through the normal validation paths…

jomaxro · October 1, 2016, 4:32am

Thanks for the update @codinghorror. We appreciate you continuing to investigate the barrage of spam we received. The deeper fix you mentioned, are you planning on adding that to the 1.7 task list or do you expect it to be in a later release?

As to the feature this topic is about, I’m still questioning your stance regarding proactively checking emails against recent registrations. You’ve mentioned multiple times in this topic that it is critical to delete spammers as soon as possible. I have no issue with that, and have informed the rest of our moderators and staff about this. You’ve stated that doing so kicks in even more spam protections (IP & email blocks and the levenshtein check, for example).
My concern is what to do when the spam attack first happens and there is no staff online. I think it is unrealistic to assume that every spam post will be caught by a filter of some kind, and similarly unrealistic that there will be staff online 24/7. Not checking for similar emails until staff intervention makes it easier for a spammer to keep spamming if they’ve gotten around the filters.

sam · October 1, 2016, 6:19am

I just saw this today and I am a bit stumped at how to protect

codinghorror · October 1, 2016, 7:52am

It is a catch 22, until the email has been definitively marked as a spammer it cannot be checked for anything. So step zero is to delete the spammers as spammers, then the emails can be vetted.

jomaxro · October 1, 2016, 9:58pm

This is what I am trying to get an answer to: why is this the case? The email is there in the database, so I don’t think it is a technical issue - but I could be wrong…

mpalmer · October 3, 2016, 9:53pm

False positive rate vs ease of getting around it, would be my guess.

jomaxro · November 7, 2016, 11:08pm

Any update on these “deeper fixes”? We’ve enabled “must approve first 2 posts” back when the attack happened, and while we haven’t seen any spam get caught, we’re hesitant to disable it until we know this fix is in.

sam · November 7, 2016, 11:17pm

Yeah we did a pretty awesome fix that stops TL0 from editing, this closed a pretty big hole.

jomaxro · November 7, 2016, 11:18pm

Awesome, thanks! I’ll ask an Admin to remove the post approval for us.

codinghorror · November 7, 2016, 11:23pm

Yeah and enable the “TL0 users can’t do edits” setting as well in your site settings.

jomaxro · November 8, 2016, 12:29am

Just to check, are you referring to min trust to edit post set to 1?

codinghorror · November 8, 2016, 12:30am

Yes. Specifically it is the grace period edits that are the problem, not editing per se, but this was the easier fast fix. Setting grace period for edits to zero would probably also work, but significantly impact all users…

jomaxro · November 8, 2016, 12:32am

Do you expect to eventually support setting the grace period for TL0 to a different value than TL1+?

codinghorror · November 8, 2016, 12:32am

That is another good idea.

Topic		Replies	Views
Lots of Spam New User Registrations? Support	43	5639	April 30, 2024
Blocked Canonical Gmails - Issue Support	13	1160	June 10, 2021
Protecting against gmail dot trick in Discourse Feature	87	20421	May 1, 2020
Suggestion: Wildcard Block Email Address Feature	33	4257	December 7, 2021
Dealing with unwanted (and probably spam) accounts via SSO? Feature sso , wordpress , discourseconnect	36	8783	October 16, 2022

`levenshtein distance spammer emails` should flag accounts that are similar even if no accounts have been marked as spammers yet

Related topics