It seems that the blocking is still working, just not entirely, as I’m still seeing regular matches against the record in logs → screened emails, but not for all combinations. The user was able to make a few hundred accounts today using the same blocked gmail.
The gmail dot variations they are using seem to be between 6 and 14 periods, the email length is 19 (before @), they aren’t using + variations (or all of those variations are being blocked successfully).
Might be relevant, I have levenshtein distance spammer emails set to 3 (default is 2). Discourse was recently updated from 2.6.x to 2.7.1 stable.
Hmm, I forget where we landed on this one @sam, but that would possibly be a bug, since you said
This means that if firstname.lastname@example.org gets blocked we will go ahead and block email@example.com instead. Then when firstname.lastname@example.org tries to sneak in they will be blocked due to canonical matching.
So what happens when sara.hanson@ does something awful and and sarah.anson@ gets caught in the crossfire? This is just like how I’m not sure joe98@ and joe99@ could be considered the same email address either. I suppose this depends upon the membership of the community and the level of manual discretion used in the matching process.
“Plus addressing” at least refers to a folder belonging to the mailbox of the same email address (given that everything before the “+” is the same).
Perhaps combat registration by IP range? All of this depends upon how sophisticated the spammers are. Coming here from the Let’s Encrypt community, we have a tracking thread over there detailing some pretty broad spamming tactics that have been attempted. We’ve even had people provide actual technical help before spamming weeks later.
Interesting. I never realized that gmail actually made that distinction. Learned more than a few new things today. I wonder why they would do that? Seems like it would eat up a fair amount of real estate. Are gmail addresses the only concern here?
I can confirm that the original fix worked perfect and solved this issue with gmails. It would be a real life saver if this optional mode was returned.
Spammers are constantly learning new techniques and are still successfully gaming big players like Facebook, Instagram and Twitter. This makes most other places ‘ez mode’. It’s a full time job for many of them, so it essentially becomes:
If exploitable and (resources required < money earned), then it will be exploited.
They can get around practically any measure, the only hope is to increase the costs of doing so to a point it is not financially rewarding to do so.
Being able to bulk spam with close to unlimited emails/accounts (prior to detection and a mod/admin retroactively blocking their canonical gmail and manually removing their posts) is quite cost efficient. More so if there is not a team of 24/7 moderators.
The cost to get around anti spam measures continues to decrease. One example is 4/5g proxies, for something like $30-$50 or so per month people can get access to virtually unlimited real mobile ips, from legitimate ISPs/ASNs that automatically/manually rotate and are shared by entire cities/states of legitimate users from major ISPs. 4/5g ips are shared by many users simultaneously.
Blocking these ISPs/ASNs or IPs is not suitable (can’t just block everyone using verizon, at&t etc.). They generally use the ip once and dump it. The blocked individual IPs from this will also block legitimate users who are sharing that IP address at random. IP blocking is slowly becoming a legacy practice (excluding ASNs of known hosting companies). You can see the tip of the iceberg on these forums:
I believe the spammers are a mixture of fully or partially hand-rolled bots and manual spam. As Discourse takes more market share, which it clearly is growing fantastically, I’d be surprised if it doesn’t become a target of commercially available bots.
Whenever Xrumer starts supporting the latest recaptcha version, I’d say most webmasters on legacy forums notice a large uptick in spam due to the rock bottom cost of spamming (no longer need to use a captcha solving API, which are already very cheap per 1k solves):
People can already make their own plugins/scripts to support basically any platform using Xrumer. But if they support Discourse out of the box some day:
I can’t claim to be impartial on this, seeing I’m in direct need of anti-spam measures. The original post about the gmail dot trick was created by someone else in 2014 and seems that another user solved this by requiring approval on the first x amount of posts, so maybe there is at least three user reports?
Sorry for the tangent, back on track.
Regarding the regex blocking for emails, yes you are correct. It is a partial solution, but not ideal for these reasons:
If blocking all gmails with 1 period or more before @:
It will unavoidably block real legitimate gmail users that have either 1 or more periods in their gmail, which is very common.
The spammers can still create quite a lot of variations with one period. e.g. gmail has a maximum length of 30 characters e.g. email@example.com will have 30 usable combinations with a single period.
Blocking all gmails with 2 periods or more before @:
Less legitimate gmails blocked, but still will block legit gmail users that have more than 1 period in their email.
The spammers can create a lot more variations with a single 30 character gmail. I think ~842 combinations or so.
I can confirm that the new accounts came through after the block was active, as the block created date is Feb 1. I was watching new accounts being created yesterday while seeing both cases of the block rule having new recent matches as well as new registrations coming in using the combinations of the same email (periods only).
I disabled registrations overnight and have re-enabled them this morning. They have created 104 new accounts so far today with permutations of that gmail address and continuing to register more. I can confirm that once the periods are removed from the emails of these accounts it is an exact match with the Screened Emails blocked record.
I tried testing the blocks in rails c as described, this is where it gets a bit weird.
So it seems that some records are returning ‘true’ as intended and some are returning ‘false’ even if the email tested is an exact match to the canonical blocked email. For the records that return ‘true’, it worked entirely as intended and returned true for all the variations that I tested. But the emails that return false, all variations I tested returned false also.
I was trying to find any correlations. I can confirm these are not correlated (or at least not consistently correlated):
Email length (before @)
Email containing characters and numbers
Matches (amount of times blocked)
It does seem like there is a correlation with the block creation date though, older being less likely to work (returns false). Records that were created 9d ago returned a mix of true/false and all records I’ve tested so far that were created earlier than that (1h-8d) are returning true.
Could maybe be related to ‘max age unmatched emails’ perhaps? I think this option is somewhat new, I have it set at the default value of 365 days.
Well, if you can come up with detailed repro steps for a bug, we’ll definitely fix it.
max age unmatched emails is not a new setting, though – along with max age unmatched ips this is a tool for cleaning up really old entries in the screened IP and Email lists respectively, entries that have not matched anything in a year.
Only have a little bit of time to post today, but wanted to share some additional information before responding more thoroughly.
I’ve found that deleting a record that is returning false from logs → screened email (allow), then blocking the email again (by delete user + block on the user’s admin page) has made a previously failing rule consistently return true now for the direct match and variations.
This seems to match with the observation of the issue being with older records. Will need to test more.