Protecting against gmail dot trick in Discourse

codinghorror · October 13, 2019, 11:55pm

Why take a complex dependency here when a trivial email address lockdown mode is so much simpler to implement and reason about? Plus now it opens us up to new exploits? No thanks!

Given the rarity of this complaint and the exceptional circumstances around it, I think going the simple extra strict route is preferred.

sam · October 14, 2019, 12:03am

The simple route of ban anything that is not /a-zA-Z0-9/ works but has major usability issues, a large amount of people will not know how to sign up and we would need new error messages. Many people that use gmail do not know that janedoe@gmail.com works as their email when they always thought jane.doe@gmail.com was their email. The banning would impact OAuth and cause login with gmail to fail for it to work correctly.

Email address: sam.s@gmail.com
ERROR: . is not allowed in email addresses (new message)

Normalizing is less user hostile and requires no new UX.

We could start with a dumber optional normalizer. (strip tag, strip dot for gmail)

That said, to be 100% clear I am not suggested a dependency here, email_address is broken and not suitable for what we want here.

A rushed half measure here will just create a “break email on my site” site setting which I am not particularly keen to add.

codinghorror · October 14, 2019, 12:07am

Right but his site is under siege. He has thousands of these duplicated accounts signing up per day. So it is sensible that there should be a simple lockdown mode for email addresses, to offer to sites who are at war with Mossad and losing badly.

War requires sacrifice. There will be civilian casualties. His site is already broken as hell.

Mevo · October 14, 2019, 12:07am

Ideally, you would need a table of the email providers and how to “clean up” according to each provider (just what you actually quoted). Like Bart well explained, it’s not about preventing to use any email address because of some characters, but to be able to detect which addresses are actually the same ones.

Sure, spammers who really want to, will always be able to. It’s like with alarms/locks and robbers: The idea is to make it more difficult.
Creating x Gmail addresses is spamming Gmail, that’s their problem to address (even if it can be used to spam you afterwards).

sam · October 14, 2019, 12:13am

I am not following.

If we treat bob.test+100@gmail.com as bobtest@gmail.com internally and store it internally in that way (when the switch is on), what sacrifice exactly is being made here?

The bug is specifically with gmail so to me it seems an over reaction to ban all dots everywhere cause Google decided to invent a spec and normalize. The logic for cleaning up is actually quite straight forward and this would be optional default off.

@Mevo just to be 100% clear here, the proposal by Jeff here is that we add a “disaster mode” in disaster mode bob.test@gmail.com is an invalid email that can not be used.

Stephen · October 14, 2019, 12:28am

I would suggest comparing to the simplified form, but you need to be careful to still store and relay email to the originally specified address.

You don’t have consent from the user to message any other variation, and using anything other than the address they specify could result in them not receiving the message.

As an example, I have a Gmail address which was created back in the first months of the service. Email to the base alias is effectively discarded. Only emails which hit specific plus addresses will ever be seen.

Be careful with assumptions too- many Gmail users have no idea that the dots are optional. The gross majority have never heard of plus addressing either. Triage to prevent abuse of the latter risks casualties to the former.

Mevo · October 14, 2019, 12:31am

@sam I well understood what Jeff is meaning, and like you, I am against what he proposes (no offence to him, disagreements happen).

This is probably being picky, but storing only the “cleaned up” email address will remove what some legitimate users are doing on purpose. Example: User registering (totally legitimately and only once) with bob+meta@example.com or bob+forums@example.com will loose what he was trying to accomplish. The problem is that he will only receive emails at bob@example.com, and that’s not what he wanted (he can for example use the “tag” to put the emails received in a specific folder)

I totally understand that taking this in consideration would make it a little more complex. You would need to store BOTH the version entered by the user (to send emails) AND the cleaned up version. You can use the cleaned up version like you use the email addresses right now (for everything internally related to the user, and to check if that user is already registered). Additionally, to not have that little problem, you would need to store what the user entered, on top of that (solely to use for sending emails). It would be the equivalent of the “reply to” address in emails.

I hope that’s understandable.

EDIT: Written at the same time as @Stephen (globally the same idea)

sam · October 14, 2019, 12:33am

This is a very good point, it does make this slightly more tricky to build.

I guess you would only do the check on “create new account”:

Does any email address already exist in the system with this canonical form? If yes, sorry, no new account for you.

There is a side problem of Google OAuth (would it also check for canonical email), and transition from non-canonical to canonical.

Stephen · October 14, 2019, 12:34am

Oauth doesn’t work with plus addressing afaik, so isn’t it out of scope?

By that I mean I can’t create a new account using google, specify a different alias, rinse and repeat.

sam · October 14, 2019, 12:36am

Same problem space.

I sign up with sam+hi@gmail.com … then I click the login with google button what happens?

Currently: new account created
Proposed:
- Option A: error screen, you can not create this account
- Option B: user logs in with sam+hi@gmail.com

Original proposed lockdown mode, sam.test@gmail.com can not log in with login with google button.

Stephen · October 14, 2019, 12:37am

Assuming you can come up with a robust translation to remove plus addresses and errant dots you could just keep a hash of the de duped email and compare to that account creation?

sam · October 14, 2019, 12:40am

That is option B, hence There is a side problem of Google OAuth also migration issue is hairy, but could probably be skipped.

That said given the scope here of the problem in the wild I don’t really anticipate us working on any changes here in the next few months.

Mevo · October 14, 2019, 12:42am

As said above, using solely the “canonical” version internally and storing additionally what the user entered (just to send emails) wouldn’t be a solution ?

sam · October 14, 2019, 12:46am

We can solve this just fine, I estimate 2-6 days of work in testing and debugging of such a new switch cause there are lots of little things to worry about.

The problem here is that @codinghorror can not justify budgeting this amount of time for this feature.

We can implement break a big pile of email logins in 1 day of work, but I don’t want to have such a setting in Discourse.

So you are in a bit of a pit here @Mevo … more people need to experience and report this problem so we can justify spending the time on this.

Mevo · October 14, 2019, 12:54am

@sam I do understand.

(btw, I am seeing this for the first time. Your post was automatically edited: " [system] — Automatically removed quote of whole previous post". Wow ! That’s a very nice functionality !)

Stephen · October 14, 2019, 12:57am

You need to be very careful to never store the canonical version. The user didn’t consent to provide it, and if your user tables are compromised they can’t readily identify which service has compromised their data.

Facebook has repeatedly gotten into lots of hot water storing PII related to users which they neither provided, not consented to have associated with their account.

codinghorror · October 14, 2019, 1:27am

I see no problem at all with this setting personally, I am just loathe to do it because ‘that one guy had a problem that one time’.

Yeah this is a terrible thing to suggest that we add to Discourse. I would be violently opposed to adding it. Plus addressing is a feature, has always been a feature, and it’s user-friendly.

If you are getting attacked by Mossad … enable Mossad Attack Mode. We just need Mossad to attack more folks I guess?

sam · October 14, 2019, 1:30am

I am violently against adding this setting to Discourse. I am totally fine with someone building a plugin for it, it is just a few lines of code in a plugin. If you must must have it I will take a break and build the plugin today, just let me know.

Kind of pointless building it cause the one person that has the problem is already saying they will not use it.

A setting of “break my Discourse” is fundamentally bad and does not belong in the product imo.

codinghorror · October 14, 2019, 1:31am

I think if more people were having the problem an email lockdown mode would be more defensible. But right now it’s just that one guy on that one site.

So we wait and see…

sam · October 14, 2019, 1:32am

On guy, on one site, that would not use the feature

Is more accurate…

Topic		Replies	Views
Suggestion: Wildcard Block Email Address Feature	33	4306	December 7, 2021
Blocked Canonical Gmails - Issue Support	13	1186	June 10, 2021
Dealing with unwanted (and probably spam) accounts via SSO? Feature wordpress , sso , discourseconnect	36	8833	October 16, 2022
Need assistance with massive amounts of spam Support	14	1664	June 10, 2023
Different password reset for wrong username/email Feature	65	27815	August 4, 2023

Protecting against gmail dot trick in Discourse

Related topics