Hum, I’m using Cloudflare for CDN, and discourse only see Cloudflare, not user’s IP. (in Wordpress, Cloudflare has an plugin to pass the user IP to website)
At vB we used to get literally thousands of bot “seed” accounts like
aliasg.maila.ccount
alia.sgmai.laccount
ali.asg.mailacco.unt
alias.g.m.ailaccount
al.iasgmai.lacc.ount
… etc. ad nauseum
We eventually had a plugin written to deal with them
You really need to get IP passed through correctly otherwise you are really screwed. That’s about the only effective way to stop spammers, if they are clever.
While in this case banning the IP is the right thing to do, I think there is merit in being able to stop user.name@gmail.com and username@gmail.com from being both registered as two different users at any given discourse forum.
No sane administrator should allow this behaviour (from gmail) and maybe we could have an option to extend this prohibition to other email providers as well.
It would need a simple list like ‘@gmail.com’, ‘@anotherprovidder.com’ and then it would check for registered users by removing the dot or any other relevant character (could have a list as well) to avoid users that want to have two or more accounts.
Maybe a plugin with this functionality would be the best solution.
We are defenseless against these registrations unfortunately. The only defense is not being targeted by spammers who have any skill.
There is objectively no way to block a spammer with sufficient IPs from creating 100,000 accounts using a single gmail address on any standard Discourse forum using these tricks.
All spam/posting throttling settings are futile when the spammers have access to unlimited accounts with a single gmail address.
It’s strange that your site has such a severe problem with this when I can’t recall a single time this has happened across our ~1,000+ hosted customers over the last 4 years?
@codinghorror Do you credit this to sufficient defenses for this common spammer technique or those sites not being targeted by spammers? Do those sites get regular large volumes of attempted spam registrations of any sort that are blocked by defenses?
It heavily depends on the niche, traffic volumes and if it’s suitable for their direct response spam campaigns. A spammer who can keep posts near the top of the posts list is essentially getting fantastic ad space for pennies on the dollar. Homepage above the fold ad space can very commonly be worth $xxx - $x,xxx per day depending on niche and traffic volume.
Spammers making significant money per month spamming their direct response campaigns on a specific forum, who may live in developing countries with extremely low average local salaries, might be motivated.
I have several other Discourse forums running since 2015-2016 and they have virtually no issues at all with spam registrations or posts, due to not being targeted. Not being targeted by spammers is a nice defense, until you get targeted. Discourse isn’t supported by default in most commercially available forum spamming software as far as I know, like Xrumer.
True, a dedicated spammer will always get through. We’ve seen them open their own mail servers and generate email addresses by the thousands - good luck with that.
That said, not allowing duplicate registrations using these gmail tricks seems a reasonable precaution?
@codinghorror - Hahah, love it. Would rather not just roll over and die though. I don’t think this is an edge case, lots of social sites don’t allow registering using the same gmail address due to spammer abuse. Most of the big ones as far as I know do not allow it.
@bartv - Yeah, with the ones that have their own mail servers, at least we can blacklist their domains as a decently effective defense (although the accounts that made it through prior to blacklisting are still usable). They can get more domains, but at least that costs them resources - unlike with the gmail tricks.
With these gmail tricks, there really isn’t any defense and additional address variations cost nothing to the spammer. ‘levenshtein distance spammer emails’ can somewhat help with the dot trick after banning the same gmail address in different dot combinations a lot of times. Can’t currently defend against the + trick though, which allows essentially unlimited combinations.
Unless you have powerful enough friends to help you. And that’s Discourse development team, here (and maybe the community if we think about plugins).
I’m sorry, but wouldn’t it be a good thing for discourse to treat all gmail addresses without the dots and what comes after a + sign ? Doesn’t seem technically very complicated. It’s just a few pretty simple lines of code. Registration => detects gmail.com after the at sign => removes every dots and what comes after a + sign until the at sign, and use that address => Already used ? => Returns an error message “Email address already used”.
Done. Or am I missing something ?
If spammers start to know this works with Discourse, they are going to target more and more Discourse forums with that technique. I mean, why wouldn’t they ?
We set new users must have all topics/posts manually approved for the first X times, killed the issue almost overnight. Some of them worked out editing their post after the fact works til we adjusted that for Trust level 0/1 too.
Whilst this has nothing specifically to do with any one domain trick or another, it disrupts the actual problem, which is a motivated human trying to get around your counter measures. If they can’t use a gmail “trick” they’ll find some other trick. Trust level 1 can post and trust level 1 needs 5 minutes of read time is the kind of thing I’d veer towards personally.
Well, let’s see. What all characters can be used here?
Some mail services support a tag included in the local-part, such that the address is an alias to a prefix of the local part. For example, the address joeuser+tag@example.com denotes the same delivery address as joeuser@example.com. RFC 5233, refers to this convention as sub-addressing, but it is also known as plus addressing, tagged addressing or mail extensions.
Addresses of this form, using various separators between the base name and the tag, are supported by several email services, including Runbox (plus), Gmail (plus), Rackspace Email (plus), Yahoo! Mail Plus (hyphen), Apple’s iCloud (plus), Outlook (plus), ProtonMail (plus), FastMail (plus and Subdomain Addressing), MMDF (equals), Qmail and Courier Mail Server (hyphen). Postfix and Exim allow configuring an arbitrary separator from the legal character set.
So we have: plus, hyphen, equals, period, and pound/hashtag.
The only thing I can think that will work here is a super strict setting to prevent all characters outside A-Z a-z 0-9 in email addresses.
It will definitely prevent some users from signing up, but that might be a viable tradeoff if you are being… uh… targeted by elite Mossad agents, I guess
That might be too strict; the point here is to prevent multiple use of such email address variations, not to ban then entirely. You could instead add a ‘canonical’ email address to each account that contains the cleaned up version of the actual email address of the user. You compare the cleaned up version of email addresses for new registrations against this value. It’s probably easier said than done though…
Then add a hidden site setting with whitelist of domains for . removal.
That said, plenty of ways to abuse this, heck someone can be sitting on a cache of 10000 spam gmail addresses and just sign up with all of them with some kind of bot. If you are being targeted might as well approve every new signup for a while, maybe this is one of the rare cases where you want a recaptcha on signup in a plugin.