Safety’s not really the goal here. The site in question needs a more extreme solution due to the scope of the problem they are facing. As long as it is optional (add your own “email protection regex”) then it seems perfectly safe to me, for sites that need it, they can opt into Full Lockdown Mode.
We currently have
blocked email domains
I guess we could add:
blocked email patterns
Getting the regex right though is somewhat annoying given all the escaping needed. I worry about giving options like this, cause the odds of people getting the regex right and as intended are quite low. They need to remember to escape bot dots and pluses.
We could I guess do a non regex based simplified pattern that just expands
Sorry for the late response!
If the previous implementation was re-added as an option, I believe this would entirely solve the gmail issue. At least in my case. It’s quite perfect in my opinion and adds enough resource costs to the spammers to make fighting it manageable. It’d really be the difference between requiring 24hr full time high intensity moderation and not.
I’ve blocked several domains that allow similar and make use of the allowed email domains list. The problem is that people can create many accounts prior to getting one of their accounts banned/blocked (which activates the blocking of permutations of that gmail address for new accounts, but existing accounts are left untouched). Making it quite a burden for moderation and tedious to clean up each individual account afterwards.
For example I’ve had a thread that had ~200 or so replies, using 1 post per account, all made with the same gmail address. A lot of similar cases. These being an example where the accounts are easy to find, as searching for them via permutations of the original gmail is really difficult as an alternative. Some will farm a large amounts of accounts using a small handful of gmails and not post on them until months later.
For regex blocking as a solution, blocking + signs would be fairly harmless, periods (.) would likely block a significant amount of legit emails i.e. email@example.com. Blocking addresses with more than one period would probably have minimal collateral damage, though would still allow several permutations of a gmail address, but much less than with 2+ periods.
IMO the previous implementation is ideal and not unreasonable to implement as an optional protection, most popular social sites won’t allow signing up using several gmail permutations due to it being heavily exploited by spammers.
@sam I feel quite strongly that sites should be allowed to implement this optional level of email regex lockdown if they need it. Otherwise we’re going against one of the core principles of Discourse, which is to be “safe by default”.
We can get this done for the next release, I still stand behind my original implementation though, canonicalization is the most friendly solution for site operators, you check a box and tada issue is fixed. With regex, you learn regex (so there go 5 hours) and end up with a fix that lets spam accounts slip through or is user hostile (no dots, no pluses) or is a compromise
That said, sure we can slot regex support for next release
Nahh, it’s real easy, just “no emails allowed with plus or period in them” which is admittedly quite restrictive and obviously we would not want it on by default… But it’s like the bamwar thing: there will always be enough bad actors that you have to make the nuclear launch button, even if you don’t want to use it…
It’s like nuclear war. Once you have nukes on the table, the “user friendly” options aren’t possible any more, you just have to hope most of the time you never need to go there.
The thing is forcing canonical emails is effectively the same except that it is far less user hostile.
Sam.firstname.lastname@example.org is allowed
Sa.email@example.com is already registered
Anyway we can do regex next release
Yeah, but it’s useless in actual practice, which is why we’re having this discussion… once they have nukes, you need nukes too.
“User hostile” is a meaningless concept once your audience has nukes and is willing to use them.
I disagree with this, the solution worked perfectly for @markersocial, and then I reverted the change cause my hand was forced
There are no known gaps with the canonicalization approach I implemented and reverted, it solves the above gmail problem 100%
Well, I disagree, because that put the code and effort burden on our side, rather than their side. “Normalizing” emails is quite complicated, varies per email provider, and I don’t want to be in that business.
In other words, let them build and handle the nuclear devices on their own; we don’t need to ship proto-nukes to every country and in every installation of Discourse “just in case”.
(Plus being able to blocklist emails via regex is quite powerful, especially since email = identity in Discourse.)
We could let them normalise with their own regex rules as some sort of middle ground, then we are not in the business of normalisation
That said yes regex blocking or at least wildcard blocking will happen for the next release
I can confirm that the previous implementation worked perfectly and entirely solved the gmail issue. The email domain allow list and disallow list both are quite effective nukes. But it’s just not viable to block gmail.
@codinghorror I can see the point of view against normalising for different email providers. But I think it would make sense to be able to cover at least gmail (~43% of all email addresses apparently in 2020, 53% for the US) in a non-destructive way. It might be comparable to supporting oauth from large providers out of the box.
@sam ^ This is a great idea for an alternative. Maybe this, with an example for the gmail/googlemail match could be quite user friendly and powerful.
Have a user right now that has made several thousand accounts with a single gmail address (using periods) and spamming promoting their competing site to siphon off users. Will be upgrading to 2.8 and blocking all emails that contain a period or plus symbol as soon as it’s released. I do wish the previous implementation was available, but appreciate that this is being addressed and a solution will be available. It’s going to make a massive difference, thank you
So have thought about this a bit and thought of a solution that could maybe make sense.
There could be an admin option to process and store a normalised version of the email (only processing the username part i.e. username@…)
But only apply this for domains that are specified by the admin.
So a list somewhat like the email domain allow/block lists, with two checkboxes per domain:
- Strip + string
- Strip periods
Then use these records as a reference for disallowing additional registrations using alternative versions of that email (without affecting the primary email record, which can still have + and periods).
This way, the burden of selecting which domains to store a normalise record for and how to normalise them can be on the admin only, allowing them to respond to problematic email domains as they emerge.
Anyhow, just dropping this here so it can perhaps be considered at some point.
I merged the PR:
It adds a new site setting
normalize emails which will remove dots and +… part of an email and then check for its uniqueness. For example, if there is a firstname.lastname@example.org user and email@example.com tries to sign up, they will not be allowed if the site setting is enabled.
Fantastic, I think this 100% solves @markersocial’s issue and is a great setting to enable if you end up being a target for this specific attack.
Let us know how you go @markersocial
Thank you so much for implementing this, this is a massive win - so happy this has been added. I have set it live yesterday and monitoring.
So far, seems to be working 100% as intended and solve this issue entirely. People can still register with periods in their emails (and presumably +, have not seen these registrations recently). But cannot keep making accounts with variations of the same gmail. From reading the discussion on GitHub, it was definitely the best choice to keep the original email as-is.
So having said that, I will leave suggestions here that I think would improve this feature without becoming overly complicated:
Instead of having a checkbox to toggle being enabled/disabled for
normalize emails. Have two lists, similar to the email domain block list style.
- Domain list for applying period normalisation
- Domain list for applying + normalisation
This would allow admins to selectively apply these rules individually to problem email domains as they emerge instead of applying normalisation (of both types) to all email domains.
Have no expectations for the above, just leaving the suggestion in case it is useful.
Anyhow, thanks again, really really appreciative this has been implemented. It is a game changer.
I wonder though if this is a theoretical vs real world problem. I get the desire for fidelity, but would prefer to hear about a few specific cases where this is causing an issue.
The trouble with introducing a setting like this would be re-applying normalization rules when you fiddle with the allowlist of sites, it would make it a very complex change.
We now normalize unconditionally (regardless of the site setting) so turning it on is instant and applies to all history.
All thanks to @nbianca
Awesome! I didn’t realise that it would apply retroactively. I was thinking a normalised address was being saved only for new registrations.
Yeah the main chance of an issue would be for cases of email addresses which allow + aliases but don’t consider periods in different placements to be the same.
All instances of +’s in emails can be handled the same without any issue, as it’s handled the same for all providers that allow it as far as I know. Periods are the only ones where there is some difference between providers.
If I recall correctly, I think Google work emails (using custom domains), Yandex and Outlook will consider different period placements to be different addresses, but the + aliases can still be used.
So the only cases would be like
firstname.lastname@example.org existing would block
email@example.com from registering (when these are actually two unique accounts/addresses according to that email domain/provider). Which should probably be very rare to occur in the real world.
This topic was automatically closed after 16 hours. New replies are no longer allowed.