I think the previous implementation you created could still be quite useful as an additional anti-spam feature, it worked incredibly well for the short time it was available and enabled (default off).
Otherwise spammers can still create bulk accounts with one gmail address prior to a moderator or admin noticing. E.g. Creating the accounts but not posting anything immediately.
Admins/Mods will need to manually find and open each individual account to ban/delete them. Which can be quite tedious, especially when one spammer can create hundreds or thousands of accounts with one gmail prior to being banned. Also as searching for the emails is difficult e.g. j.ohan.2.1@gmail and jo.ha.n21@gmail.
If they aren’t manually hunted down, then the spammers keep a large pool of accounts to play whack-a-mole with, while only needing to expend one gmail account to obtain them.
@sam Just to follow up after more field testing, I believe that the previous implementation that was reverted is definitely much more effective against motivated spammers. I’m still getting a significant amount of registrations using these permuted gmail tricks.
I’m very grateful that the current protection was implemented, which is very effective. However I think it’s a bit of a hole to allow unlimited accounts to be created using the same email until they are specifically noticed and manually banned. It is more burden on moderators (who can’t see account emails by default unless enabled I believe), especially in the absence of bulk account removal tools (e.g. select several accounts from the accounts/search list with checkboxes and ban/remove them all). Which means a moderator will need to manually navigate to each individual account to remove/ban them. That is especially difficult when searching for accounts with permuted emails.
Seeing the previous implementation was optional (off by default), has already been developed and worked as intended, then removed. It just seems a shame that it’s not available anymore for communities that would want to use it for additional anti-spam protection against motivated spammers.
This is why I said certain characters have to be completely disallowed from emails (optionally). Specifically the characters that allow Email address - Wikipedia sub-addressing, such as plus, period, hyphen, etc. With a regex you could block it per service as well, e.g. “no email with a plus ending in @gmail.com is allowed” for example. cc @sam
Safety’s not really the goal here. The site in question needs a more extreme solution due to the scope of the problem they are facing. As long as it is optional (add your own “email protection regex”) then it seems perfectly safe to me, for sites that need it, they can opt into Full Lockdown Mode.
Getting the regex right though is somewhat annoying given all the escaping needed. I worry about giving options like this, cause the odds of people getting the regex right and as intended are quite low. They need to remember to escape bot dots and pluses.
We could I guess do a non regex based simplified pattern that just expands * and ?.
If the previous implementation was re-added as an option, I believe this would entirely solve the gmail issue. At least in my case. It’s quite perfect in my opinion and adds enough resource costs to the spammers to make fighting it manageable. It’d really be the difference between requiring 24hr full time high intensity moderation and not.
I’ve blocked several domains that allow similar and make use of the allowed email domains list. The problem is that people can create many accounts prior to getting one of their accounts banned/blocked (which activates the blocking of permutations of that gmail address for new accounts, but existing accounts are left untouched). Making it quite a burden for moderation and tedious to clean up each individual account afterwards.
For example I’ve had a thread that had ~200 or so replies, using 1 post per account, all made with the same gmail address. A lot of similar cases. These being an example where the accounts are easy to find, as searching for them via permutations of the original gmail is really difficult as an alternative. Some will farm a large amounts of accounts using a small handful of gmails and not post on them until months later.
For regex blocking as a solution, blocking + signs would be fairly harmless, periods (.) would likely block a significant amount of legit emails i.e. firstname.lastname@example.org. Blocking addresses with more than one period would probably have minimal collateral damage, though would still allow several permutations of a gmail address, but much less than with 2+ periods.
IMO the previous implementation is ideal and not unreasonable to implement as an optional protection, most popular social sites won’t allow signing up using several gmail permutations due to it being heavily exploited by spammers.
@sam I feel quite strongly that sites should be allowed to implement this optional level of email regex lockdown if they need it. Otherwise we’re going against one of the core principles of Discourse, which is to be “safe by default”.
We can get this done for the next release, I still stand behind my original implementation though, canonicalization is the most friendly solution for site operators, you check a box and tada issue is fixed. With regex, you learn regex (so there go 5 hours) and end up with a fix that lets spam accounts slip through or is user hostile (no dots, no pluses) or is a compromise
That said, sure we can slot regex support for next release
Nahh, it’s real easy, just “no emails allowed with plus or period in them” which is admittedly quite restrictive and obviously we would not want it on by default… But it’s like the bamwar thing: there will always be enough bad actors that you have to make the nuclear launch button, even if you don’t want to use it…
It’s like nuclear war. Once you have nukes on the table, the “user friendly” options aren’t possible any more, you just have to hope most of the time you never need to go there.
Well, I disagree, because that put the code and effort burden on our side, rather than their side. “Normalizing” emails is quite complicated, varies per email provider, and I don’t want to be in that business.
In other words, let them build and handle the nuclear devices on their own; we don’t need to ship proto-nukes to every country and in every installation of Discourse “just in case”.
(Plus being able to blocklist emails via regex is quite powerful, especially since email = identity in Discourse.)
I can confirm that the previous implementation worked perfectly and entirely solved the gmail issue. The email domain allow list and disallow list both are quite effective nukes. But it’s just not viable to block gmail.
@codinghorror I can see the point of view against normalising for different email providers. But I think it would make sense to be able to cover at least gmail (~43% of all email addresses apparently in 2020, 53% for the US) in a non-destructive way. It might be comparable to supporting oauth from large providers out of the box.
@sam ^ This is a great idea for an alternative. Maybe this, with an example for the gmail/googlemail match could be quite user friendly and powerful.
Have a user right now that has made several thousand accounts with a single gmail address (using periods) and spamming promoting their competing site to siphon off users. Will be upgrading to 2.8 and blocking all emails that contain a period or plus symbol as soon as it’s released. I do wish the previous implementation was available, but appreciate that this is being addressed and a solution will be available. It’s going to make a massive difference, thank you
So have thought about this a bit and thought of a solution that could maybe make sense.
There could be an admin option to process and store a normalised version of the email (only processing the username part i.e. username@…)
But only apply this for domains that are specified by the admin.
So a list somewhat like the email domain allow/block lists, with two checkboxes per domain:
Strip + string
Then use these records as a reference for disallowing additional registrations using alternative versions of that email (without affecting the primary email record, which can still have + and periods).
This way, the burden of selecting which domains to store a normalise record for and how to normalise them can be on the admin only, allowing them to respond to problematic email domains as they emerge.
Anyhow, just dropping this here so it can perhaps be considered at some point.