Protecting against gmail dot trick in Discourse

codinghorror · April 10, 2020, 5:59pm

I dunno @sam this feels more like we need CAPTCHA user creation plugin to me? I don’t feel “let’s disallow dots and plusses” is treating the underlying disease, it’s only addressing the symptoms of the problem?

Historically speaking, the trend has been toward 100% human spammers over time. I mean they fill out user profiles, upload profile images, and everything. Automated spammers (minus bamwar) haven’t really been a huge problem in Discourse because we are so hard to automate against being a full bore JavaScript application. Note that most of your comments @neounix fall into the category described in my previous sentence – it’s very hard to script us because we are so complex, compared to a 1999 era HTML 1.0 website. Raising the difficulty bar that high eliminates most of the problem, based on what we’ve observed with our customers and here on meta.

Anyway TL;DR I am not necessarily opposed to a simple “disallow certain characters in emails” setting, I guess, but in my heart I don’t think anything except a CAPTCHA is going to help much in this case? We could do both?

itsbhanusharma · April 10, 2020, 7:14pm

But Some users (including myself) use the + to actually sort emails in our mail client.

codinghorror · April 10, 2020, 7:27pm

No worries, this would not be on by default, more of an “attack lockdown” mode via site setting.

markersocial · April 10, 2020, 8:42pm

@neounix Legend. Thanks for the tips, much appreciated - you sent me on a spam whacking journey. I put Cloudflare on “I’m under attack” mode temporarily (which stopped their registrations - they were making a new account every 1-2 minutes) and checked the Cloudflare firewall logs for some IPs they were using, seeing it was challenging/logging every visitor. They were indeed using identical useragents.

I added a firewall rule to challenge users with that useragent and disabled “I’m under attack” mode on CF. I don’t believe many innocents were getting challenged by it and it completely stopped their spam registrations.

I then discovered the AS Number (ASN) blocking feature that Cloudflare has and have set up additional Firewall rules to block out a significant amount of them, referencing the useragent block logs. There are work arounds for this, I’m sure you know of them, but it’s additional resource cost and effort for them.

@codinghorror I agree with you that captchas would be helpful. I’d say a good primary spam prevention goal would be to increase the overall resource costs for spammers.

Captchas would contribute to this. $2 give or take per thousand recaptcha solves (using a captcha solving api e.g. https://anti-captcha.com). Plus extra complexities required for their bots.

Side note: Anti-captcha have a browser plugin for automatically solving your captchas, it works well and is a fun convenience.

Email addresses are usually another resource cost for bulk account creation. However it’s not the case when a single user can make virtually unlimited accounts per single gmail address. The cost of 1000 gmail accounts is quite significant, so they’ll often resort to other less strict providers or catchall domains. It will still cost them resources though and is easier to identify as spam.

I think it really is a case of more is more. No single defense will be strong enough, just increasing the amount of resources and effort needed by spammers in general are steps in the right direction. The best case scenario, is that it’s more effort for spammers to spam Discourse forums, than for admins to block it and bulk remove anything that gets through.

@itsbhanusharma I really like being able to use + also, but this is why we can’t have nice things haha. It’d be nice to have the option to enable blocking it though, if it’s needed to fight spammers.

codinghorror · April 10, 2020, 8:54pm

After thinking about it, I’m tending to agree with you on this… @sam can we prioritize this email lockdown setting for next week?

Mevo · April 11, 2020, 3:14pm

The matter had been discussed quite a bit above, in this very thread.
“Disallowing” dots and plusses would probably cause some problems (at least for some users). The idea was to store a “canonical” version of the email (“cleaned up” version), and disallow the registration of additional users with the same canonical version for gmail (=actually the SAME email, thanks to gmail tricks).

That may be what Sam is talking about when he says:

Maybe it’s what you also meant @codinghorror , and not really “disallowing” . & +
But I agree with you, that it would only “solve” the problem for gmail (not the use of a “catchall” with a domain for example)

Would a CAPTCHA really solve anything when you say yourself:

?

Stephen · April 11, 2020, 3:17pm

It does sound like we’ve skipped a step.

Forcing the use of the canonical email is problematic, blocking more than one account per canonical email by default is pretty reasonable though.

Most of us have more than one email address if we need a test account. It won’t add significant problem there, if it’s a default then we don’t need to educate people to turn it on after abuse has occurred.

markersocial · April 11, 2020, 4:45pm

Plus (+) signs in emails could more or less be treated the same across all email domains I believe without much issue.

For emails like sp.a.mmer.king@gmail.com, s.pa.mmerking@gmail.com, in the case of gmail, these are the same email. But for some other providers, it might not be the case and both emails are unique users.

–

I think a good implementation for the long term, would be something like the email domain blacklist feature.

Add a custom domain that you wish to disallow duplicate registration tricks with. Then allow enabling/disabling the blocking of these two types of duplicate registrations individually. I.e. Disallow + trick email duplicates and disallow dot trick email duplicates as separate options.

Storing the registered email as-is (in terms of the user’s log in and address that is emailed), but blocking additional registrations that are determined to be the same email.

Something else, which would make it slightly more effective, would be putting a few domains in a single custom domain record, so that they are treated as the same domain. E.g. gmail.com and googlemail.com. So someone could potentially be blocked from registering twice using e.g. example@gmail.com and example@googlemail.com. There are some other providers that have multiple interchangeable domains, I sent some examples to Sam. This could add a little bit more protection, but by and large the main exploitable issue is the + and dot trick registrations.

–

Alternatively, a potentially simpler implementation would be like above, but the two options for each custom email domain would be to block all registrations with + signs and/or periods. If a user email registers using that domain, using a + or period, give them an error instructing them to remove the periods and/or + signs from their email (possibly doing it for them automatically) and to try again. It’s not perfect, but would would still be very effective.

Stephen · April 11, 2020, 5:12pm

Correct, that’s why we would distil down to the canoncal email to ensure they are unique. It’s covered above. We can’t store the canonical email as their email address though as it’s not the one they provided.

Domain blacklists already exist, but we can’t assume that just because a user can also be reached by a googlemail or gmail address that we should reject one or the other. Hence referring back to a canonical “master”.

There are sites today where users are quite legitimately using plus addressing and dots. The point isn’t to inconvenience legitimate practices, only to curtail the unreasonable side effects such as two users for one canonical address.

markersocial · April 11, 2020, 6:11pm

If providing the period and plus sign string stripped email is required during the registration process, on the client side with consent (akin to form validation), storing it as their account email would be ok.

Not ideal or perfect, but potentially simpler and a worthy trade-off in some cases where the choice is inconvenience a few users or inconvenience an entire forum with spam.

There are gmail accounts were the primary canonical email includes periods. They would be the users most affected and confused by force removing them during registration.

I don’t think that this would be the best implementation either and definitely would not be default option friendly.

Right, what I meant was having an option menu similar to the already existing email domain blacklist for inputting which email domains should be affected and the parameters of what should/shouldn’t be used to decide if an email address is unique/canonical as being discussed in this thread. Potentially also which domains should be considered the same host e.g. gmail/googlemail.

Regarding gmail and googlemail, I think we’re in agreement. Same regarding the dots and + signs.

Essentially, allow the first registration to go through, but disallow the user from being able to make multiple accounts using that same email. Or at least minimise it within reason.

john@googlemail.com registers first → accepted
john@gmail.com registers later → rejected

matthew+{randomstring}@gmail.com registers first → accepted
matthew@gmail.com registers later → rejected
matthew@googlemail.com registers later → rejected
m.att.he.w@gmail.com registers later → rejected
matthew+{randomstring}@gmail.com registers later → rejected
m.a.tt.ew+{randomstring}@googlemail.com registers later → rejected

The googlemail vs gmail (and other providers that have several alt domains) is vastly less significant to the dot and + address issues. Handling those cases would be nice though.

Stephen · April 11, 2020, 6:31pm

That’s a really user-hostile change, and totally unnecessary. The reason these features exist to begin with is to identify the source of email. If I register using the email address stephen+meta@gmail.com I can configure a rule that allows any email sent to that address to be labelled. If meta is compromised and my email address ends up receiving spam at that alias I now know where the breach occurred. Crippling the way I use email isn’t the solution, distilling my email address down to a canonical version for comparison achieves the same end result without creating any user inconvenience.

Right, and that’s tied to the concept of a canonical address. If the feature went ahead as it was originally discussed we would really benefit from the ability to associate domains. Every dot and plus permutation and domain variation would be compared to ‘one true email’ for that mailbox without causing any friction.

Providing we don’t create any pain for users, there’s no reason this feature couldn’t ship on by default.

markersocial · April 11, 2020, 7:56pm

Agreed, imperfect solution = imperfect. I only said this as an alternative potentially simpler to implement solution. It’s the last portion of my post, presented as an alternative to the primary suggestions I was making which agree with a lot of the discussions in this thread as well as allowing +'s and dots, just not duplicate accounts.

That said, legitimate users using +'s in emails on non-tech forums/sites is generally an edge case from what I’ve seen.

Really sounds fantastic.

My post was primarily getting at, how the canonical addresses are calculated for different email domains. So it isn’t limited to use with gmail/googlemail only. I was essentially attempting to say that it could be a good long term implementation to have user options for how the canonical addresses are calculated on a per domain basis.

Some other providers allow + but not period permutations for example. Meaning that the period permutations are unique emails.

A gmail/googlemail only implementation would be great though and don’t see any reasons that it couldn’t be shipped on by default either.

Stephen · April 11, 2020, 8:14pm

Could you provide an example of one? I ask because the majority of gmail users are oblivious to the dot trick. They signed up for an address with the dot, they give everyone that version of their email and would become very confused if they were told that “their email” was invalid.

I rarely encounter people who even realise their alias minus the dots will still reach them.

markersocial · April 11, 2020, 8:26pm

Sure, I’ll PM you an example now which I’ve sent to Sam. Just because I’m not sure if it’s a good idea to publicly post this in a thread with this title, as it seems that quite a lot of spammers still don’t know about it luckily.

Yeah agreed, that would be the main confusion for regular users with that imperfect solution.

codinghorror · April 12, 2020, 3:58am

There’s no way we’d go for such a complicated approach. We aren’t going to “normalize” emails.

Either you are in email lockdown mode, which completely disallows certain problematic characters in an email address (per hardcoded email domain, maybe) or you aren’t.

That’s it. Boolean toggle. Email lockdown mode, Y/N?

sam · April 14, 2020, 4:23am

Per:

https://github.com/discourse/discourse/commit/6f9177e2ed273ebebb8306299425cbfabbf57101

This is now complete.

Use the site setting enforce_canonical_emails (default false) to enable this protection.

Once on, we disallow duplicate registrations for people using the . hack in googlemail.com and gmail.com and the + hack globally.

Fix is very safe and has zero impact out-of-the-box when it is disabled.

A side-effect of the implementation is that 1 more duplicate account will slip through once you enable the setting, as we do not store canonical form emails in the user email table unless you turn on the setting. This is perfectly acceptable imo, cause in general I am unable to find cases of this exact abuse across quite a few sites we host.

Stephen · April 14, 2020, 4:57am

Storing the canonical form at all is problematic. What format do they take?

sam · April 14, 2020, 4:58am

The spec is here:

https://github.com/discourse/discourse/blob/6f9177e2ed273ebebb8306299425cbfabbf57101/spec/jobs/user_email_spec.rb#L676-L702

If the site setting is not enabled nothing happens… zero, ziltch.

neounix · April 14, 2020, 5:08am

markersocial:

@neounix Legend. Thanks for the tips, much appreciated - you sent me on a spam whacking journey. I put Cloudflare on “I’m under attack” mode temporarily (which stopped their registrations - they were making a new account every 1-2 minutes) and checked the Cloudflare firewall logs for some IPs they were using, seeing it was challenging/logging every visitor. They were indeed using identical useragents.

I added a firewall rule to challenge users with that useragent and disabled “I’m under attack” mode on CF. I don’t believe many innocents were getting challenged by it and it completely stopped their spam registrations.

I then discovered the AS Number (ASN) blocking feature that Cloudflare has and have set up additional Firewall rules to block out a significant amount of them, referencing the useragent block logs. There are work arounds for this, I’m sure you know of them, but it’s additional resource cost and effort for them.

Thanks for the kind words @markersocial

Sorry not to reply earlier, have been busy on other tasks… just getting caught up on meta:

Detecting spam, bogus registrations, DDOS attacks, intrusions, and cyberspace situational awareness in general and all the other similar classes of detection-oriented and multi-sensor data fusion cybersecurity problems is one of my favorite topics, as you seem to know

Having been on the front lines and fought many a ''hands on" cyber battle in real time, let me give you two more hints when under attack like this:

(1) Detection is often more of an art than a pure science. The reason is that the more the attackers know about your detection and mitigation algorithms and techniques, the more they will mutate and adapt to your defenses.

(2) Also, never forget the “OODA Loop”. Observe-Orient-Decide-Act The one(s) in the cyber battle who can get inside the OODA loop of the opponent(s), will generally be the winner.

I am pleased to read you are enjoying cyberdefense and looking at the larger picture. It sounds like you have got everything under control (from what I quickly read in summary in this discussion) and that the fine meta team has also committed a helpful change for you.

If you fall under attack and need any help, don’t hesitate to reach out to me. I’m long retired from the world of chasing profits and filling up my coffers (thank goodness!), so there is never a fee to consult with me. Helping others who have interesting tech problems, especially in the area of cybersecurity and cyberwar is a higher priority for me than accumulating more wealth.

I am here for you if you need someone to bounce ideas off of and from what I have read of your replies of recent, it sounds like you have things under control.

Great job!

sam · April 16, 2020, 8:46pm

@codinghorror my thinking here is that this change is pointless and I should just revert my change

None of our hosted sites are asking for it or for maxtreme email blocking modes. None of this is a problem in practice cause we purge our inactive accounts anyway and spam scan profiles.

Spammer can just run an smtp server which is easier than automating gmail and they have access to infinity emails that way

Plus addressing is very widely used in legitimate ways

The most common issue around problems with dots in gmail is not spam, but email typos

I guess the only change I support in core is expanding blocked emails to block canonical emails, at least that is an improvement to the block email feature and solves the OP

Eg if you block Jane@gmail.com it also blocks j.ane+1@gmail.com

Any other changes can go in plugins

Does this sound ok?

Topic		Replies	Views
Suggestion: Wildcard Block Email Address Feature	33	4258	December 7, 2021
Blocked Canonical Gmails - Issue Support	13	1160	June 10, 2021
Dealing with unwanted (and probably spam) accounts via SSO? Feature sso , wordpress , discourseconnect	36	8787	October 16, 2022
Need assistance with massive amounts of spam Support	14	1552	June 10, 2023
Different password reset for wrong username/email Feature	65	27748	August 4, 2023

Protecting against gmail dot trick in Discourse

Related topics