Protecting against gmail dot trick in Discourse

Why take a complex dependency here when a trivial email address lockdown mode is so much simpler to implement and reason about? Plus now it opens us up to new exploits? No thanks!

Given the rarity of this complaint and the exceptional circumstances around it, I think going the simple extra strict route is preferred.

The simple route of ban anything that is not /a-zA-Z0-9/ works but has major usability issues, a large amount of people will not know how to sign up and we would need new error messages. Many people that use gmail do not know that janedoe@gmail.com works as their email when they always thought jane.doe@gmail.com was their email. The banning would impact OAuth and cause login with gmail to fail for it to work correctly.

Email address: sam.s@gmail.com
ERROR: . is not allowed in email addresses (new message)

Normalizing is less user hostile and requires no new UX.

We could start with a dumber optional normalizer. (strip tag, strip dot for gmail)

That said, to be 100% clear I am not suggested a dependency here, email_address is broken and not suitable for what we want here.


A rushed half measure here will just create a ā€œbreak email on my siteā€ site setting which I am not particularly keen to add.

1 Like

Right but his site is under siege. He has thousands of these duplicated accounts signing up per day. So it is sensible that there should be a simple lockdown mode for email addresses, to offer to sites who are at war with Mossad and losing badly.

War requires sacrifice. There will be civilian casualties. His site is already broken as hell.

1 Like

Ideally, you would need a table of the email providers and how to ā€œclean upā€ according to each provider (just what you actually quoted). Like Bart well explained, itā€™s not about preventing to use any email address because of some characters, but to be able to detect which addresses are actually the same ones.

Sure, spammers who really want to, will always be able to. Itā€™s like with alarms/locks and robbers: The idea is to make it more difficult.
Creating x Gmail addresses is spamming Gmail, thatā€™s their problem to address (even if it can be used to spam you afterwards).

1 Like

I am not following.

If we treat bob.test+100@gmail.com as bobtest@gmail.com internally and store it internally in that way (when the switch is on), what sacrifice exactly is being made here?

The bug is specifically with gmail so to me it seems an over reaction to ban all dots everywhere cause Google decided to invent a spec and normalize. The logic for cleaning up is actually quite straight forward and this would be optional default off.

@Mevo just to be 100% clear here, the proposal by Jeff here is that we add a ā€œdisaster modeā€ in disaster mode bob.test@gmail.com is an invalid email that can not be used.

3 Likes

I would suggest comparing to the simplified form, but you need to be careful to still store and relay email to the originally specified address.

You donā€™t have consent from the user to message any other variation, and using anything other than the address they specify could result in them not receiving the message.

As an example, I have a Gmail address which was created back in the first months of the service. Email to the base alias is effectively discarded. Only emails which hit specific plus addresses will ever be seen.

Be careful with assumptions too- many Gmail users have no idea that the dots are optional. The gross majority have never heard of plus addressing either. Triage to prevent abuse of the latter risks casualties to the former.

4 Likes

@sam I well understood what Jeff is meaning, and like you, I am against what he proposes (no offence to him, disagreements happen).

This is probably being picky, but storing only the ā€œcleaned upā€ email address will remove what some legitimate users are doing on purpose. Example: User registering (totally legitimately and only once) with bob+meta@example.com or bob+forums@example.com will loose what he was trying to accomplish. The problem is that he will only receive emails at bob@example.com, and thatā€™s not what he wanted (he can for example use the ā€œtagā€ to put the emails received in a specific folder)

I totally understand that taking this in consideration would make it a little more complex. You would need to store BOTH the version entered by the user (to send emails) AND the cleaned up version. You can use the cleaned up version like you use the email addresses right now (for everything internally related to the user, and to check if that user is already registered). Additionally, to not have that little problem, you would need to store what the user entered, on top of that (solely to use for sending emails). It would be the equivalent of the ā€œreply toā€ address in emails.

I hope thatā€™s understandable.

EDIT: Written at the same time as @Stephen (globally the same idea)

2 Likes

This is a very good point, it does make this slightly more tricky to build.

I guess you would only do the check on ā€œcreate new accountā€:

Does any email address already exist in the system with this canonical form? If yes, sorry, no new account for you.

There is a side problem of Google OAuth (would it also check for canonical email), and transition from non-canonical to canonical.

Oauth doesnā€™t work with plus addressing afaik, so isnā€™t it out of scope?

By that I mean I canā€™t create a new account using google, specify a different alias, rinse and repeat.

Same problem space.

I sign up with sam+hi@gmail.com ā€¦ then I click the login with google button what happens?

  • Currently: new account created

  • Proposed:

    • Option A: error screen, you can not create this account

    • Option B: user logs in with sam+hi@gmail.com


Original proposed lockdown mode, sam.test@gmail.com can not log in with login with google button.

Assuming you can come up with a robust translation to remove plus addresses and errant dots you could just keep a hash of the de duped email and compare to that account creation?

That is option B, hence There is a side problem of Google OAuth :slight_smile: also migration issue is hairy, but could probably be skipped.

That said given the scope here of the problem in the wild I donā€™t really anticipate us working on any changes here in the next few months.

As said above, using solely the ā€œcanonicalā€ version internally and storing additionally what the user entered (just to send emails) wouldnā€™t be a solution ?

We can solve this just fine, I estimate 2-6 days of work in testing and debugging of such a new switch cause there are lots of little things to worry about.

The problem here is that @codinghorror can not justify budgeting this amount of time for this feature.

We can implement break a big pile of email logins in 1 day of work, but I donā€™t want to have such a setting in Discourse.

So you are in a bit of a pit here @Mevo ā€¦ more people need to experience and report this problem so we can justify spending the time on this.

3 Likes

@sam I do understand.

(btw, I am seeing this for the first time. Your post was automatically edited: " [system] ā€” Automatically removed quote of whole previous post". Wow ! Thatā€™s a very nice functionality !)

You need to be very careful to never store the canonical version. The user didnā€™t consent to provide it, and if your user tables are compromised they canā€™t readily identify which service has compromised their data.

Facebook has repeatedly gotten into lots of hot water storing PII related to users which they neither provided, not consented to have associated with their account.

4 Likes

I see no problem at all with this setting personally, I am just loathe to do it because ā€˜that one guy had a problem that one timeā€™.

Yeah this is a terrible thing to suggest that we add to Discourse. I would be violently opposed to adding it. Plus addressing is a feature, has always been a feature, and itā€™s user-friendly.

If you are getting attacked by Mossad ā€¦ enable Mossad Attack Mode. We just need Mossad to attack more folks I guess? :man_shrugging:

I am violently against adding this setting to Discourse. I am totally fine with someone building a plugin for it, it is just a few lines of code in a plugin. If you must must have it I will take a break and build the plugin today, just let me know.

Kind of pointless building it cause the one person that has the problem is already saying they will not use it.

A setting of ā€œbreak my Discourseā€ is fundamentally bad and does not belong in the product imo.

I think if more people were having the problem an email lockdown mode would be more defensible. But right now itā€™s just that one guy on that one site.

So we wait and seeā€¦

1 Like

On guy, on one site, that would not use the feature

Is more accurateā€¦