Only allow user profiles for TL1 and above?

zogstrip · November 20, 2014, 7:00pm

It’s now available

codinghorror · November 27, 2014, 12:56am

A few more improvements in this area:

@eviltrout just added a “Suspect” tab on admin, users.
@sam added the ability to click on the avatar in these lists to quickly view the user card for that user (and the profile page).
@zogstrip made it so that TL0 profile customizations don’t show up for anon users. They only show up when you reach TL1.

We’ll add some more checks later, but for right now, “Suspect” means users with

1 or less topics viewed
1 or less posts read
have filled out “about me” field on their user profile

This is by far the most predictive set of data I’ve seen on users that marks them as profile spammers. And FYI if you see a number at the end of their username, or their username is a random string of chars… that’s also highly predictive.

codinghorror · November 27, 2014, 1:36am

Xorlof:

I am unconvinced by your argument that allowing people to reveal themselves as spammers outweighs the reality that you’ve now got spammy accounts hanging around. The reason I’m unconvinced is that the same spam house doesn’t typically just create one account on a forum never to return. They come back and create more for new clients. Even with the above mitigating factors you mention, Jeff, they’re still able to provide a link to their profile page as proof of work to their clients. If you stop that, they can’t provide that, and you’ve got 1 blank spam account instead of eventually hundreds. Remember, lots of these are human registrations these days–that’s why they’re coming from countries with cheap labor–and humans will know that they can’t fill out a profile and so shouldn’t return. With TL0 not having a profile by default, Discourse sites become dead to them for profile spamming.

The solution we came up with is that TL0 profile modifications are no longer visible to anonymous users. They were already invisible to Google of course due to our default robots.txt.

I appreciate your feedback on this, and I must say, profile spamming was a huge problem on at least one of our partner sites and I personally cleaned most of it up over the last week, so I feel your pain.

We’ve made a ton of improvements in this area as a result. We hate spam, and we want all Discourse instances to be safe out of the box with default settings.

codinghorror · November 28, 2014, 9:54am

I personally deleted thousands of these profile spammer accounts (to understand better what we’re dealing with) and what you describe happened maybe 5 times out of those thousands. If that.

This is exceedingly rare. For what it’s worth, this is what they look like:

For all that “sophistication” they ran these accounts from the same IP so they were easy to find as 4+ dupe accounts. But yeah, if they ran these accounts from unique IPs, they might have gotten away with it.

But this is so, so very rare based on the samples I have. Your average profile spammer is dumb as a box of rocks.

fantasticfears · February 13, 2015, 12:55pm

The crawler can’t even find the profile because there is no link pointing to user profile page in any topics.

Mittineague · February 13, 2015, 10:54pm

Not within the forum perhaps, but elsewhere?

riking · February 13, 2015, 11:18pm

It’s also disallowed in robots.txt.

penne12 · February 14, 2015, 1:40am

I could see that happening nicely, maybe with an email reminding them to come back, and making users who haven’t came back in a while fill out a captcha.

fantasticfears · February 14, 2015, 3:17am

Also no content in the profile, except the username.

Mittineague · February 14, 2015, 3:58am

Are you sure?

JavaScript disabled
UA Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)

fantasticfears · February 14, 2015, 4:03am

Well, You are right. I checked the code. The bio would show if the user trust level is above 0.

jeffwidman · February 15, 2015, 12:11am

I recently spent about a week cleaning up a ton of spam on a forum I purchased. This was from SMF software, not Discourse, but I think the patterns might be useful for building spam heuristics.

Telltale signs of spammer:

They use a lot of '.'s in a gmail address. Basically they can have one spam gmail address and register 50 accounts by randomly sprinkling periods throughout. This is a little tricky because legit users sometimes do the same thing with susie.forumname@gmail.com
So I ran a regex to remove all periods and anything after the “+” sign, and then GROUP BY email address and anyone with more than three accounts tied to the same gmail address got nuked. This is a gmail specific tactic, but it wiped out 10K spammers for me. This particular forum allows users to have multiple accounts, so I allowed users who had two accounts tied to the same email to live.
Users linking to the same url. Required manual inspection again, but there were 4 urls out of 100 that were legit. The other 96 urls were clearly spammers who’d created multiple spam accounts linking to these sites.
Anyone with a url in their description/about me/homepage/any text box was worth doublechecking.
Spammers often copy/paste “location”… found a lot of accounts sharing the same typos even. Example “location field = ‘United STates’” A simple SQL group by on location flushed a lot of these out, but I still had to manually review just in case.

For all these tests, I doublechecked that none of the accounts had any PMs or Posts–if they did I manually reviewed the account to verify my filter wasn’t catching spammers.

Other random comments:
I do agree with @codinghorror above that I’d rather give spammers a chance to expose themselves, which means letting them enter profile info.

Google does crawl plain text urls, even non-hyperlinked ones. Excluding through robots.txt makes a lot of sense.

I would never delete inactive users unless they were verifiably spammers. I can personally attest to being one of those users who’s signed up for a forum account, never posted, and then returned years later and started using the account.

Similarly, most forum owners I know would rather have more accounts than less and so would never delete inactive users. A potential forum buyer will generally pay more for more accounts even if they’re inactive. A sophisticated buyer will ask you to run a few SQL queries to see how many users are actually active, but most won’t.

codinghorror · February 15, 2015, 1:05am

Very helpful, thanks for sharing this.

I would rather educate people about this rather than mindlessly propagate bad ideas.

Why? Why not create the account when you actually need it and will use it? What value is there in an empty, never used account?

jeffwidman · February 15, 2015, 9:38pm

If the account is deleted, there is no chance of ever reactivating that user. If the account remains around, there is a small, but still non-zero chance of reactivating them.

I, and most other forum owners that I know who run forums semi-professionally do not want inactive accounts deleted unless they’re spammers. This is not pulling opinion out of thin air–this very topic has been discussed several times on a private forum I belong to for folks who own forums with >2M posts. Each time, folks are like “Well, I’d like to delete these user since 99.9% of them won’t ever return, but I’m not going to touch it because of the 0.1% who might return. And if there’s any chance a buyer will pay more for more users, then also not something I want to touch.”

I do challenge the statement that this is “mindlessly propagating bad ideas”. Most forums these days are sold on multiple of revenue, but a buyer will still look at the user count and think “I’m willing to pay a little more because I can reactivate some of these users with better email campaigns etc and increase activity.” I’ve been successful in doing this myself, so while I ask a seller for a breakdown on active vs inactive users, I am willing to pay a small amount more if there are a bunch of inactive users that I might be able to activate. The more important thing to me is how many of the inactive users have a bounced email address–those are the accounts that are worthless.

sam · February 16, 2015, 1:23am

This is configurable, so really this is just a question of defaults.

I would argue that, in general, the cost of having a few hundred “prime” usernames parked by dormant never used accounts, with a pretty significant risk of being “spam” bombs activating 1 year in, is worse than the impact of deleting these no-op accounts.

So it really is just a question of which default is saner.

codinghorror · February 16, 2015, 1:37am

At the very least you would need to determine how many of those 3, 4, 5 year old accounts have valid emails if you are “buying” a bunch of totally inactive signups. We only validate email at signup and after a year of not seeing that user we stop mailing them digests.

Note that at no point does Discourse delete users, nor is there any code to do so in the current code base, provided they validated their email address. Users are given 7 days (default value) to validate email after signup.

(Also if I was selling a forum, and those are the ground rules, I would hire some spammer to create tens of thousands of new accounts with unique email addresses, thus boosting the sale price.)

jeffwidman · February 16, 2015, 2:13am

I misunderstood then; I’m sorry.

I’d thought from what was written previously we were discussing users who had validated their email but never actually posted.

My apologies.

As long as there’s a setting to turn of this auto-deletion (should it ever make it into the codebase) I’m happy.

For users who haven’t validated their email after signup, I’d probably send them a followup email or two as a “reminder to activate your account”, but that’s just me and I certainly understand a sane default is to delete them instead.

Re: Spam bombs–the ones I’ve observed had fully activated accounts. I don’t have a site currently on Discourse, so this may be different for some reason, but on other forum software they make sure to fully activate the account and then go dormant until they think they’ll be off the radar.

Re: hiring a spammer… that’s just one of many risks associated with buying a forum. Bigger risk is whether mods/users will stick around through the change of ownership or head elsewhere.

codinghorror · June 13, 2015, 8:48pm

Since I created this image for @cpradio it also belongs here:

Basically, a TL0 profile is only visible to that user and other registered users, because

ralphm · June 13, 2015, 11:46pm

I still think it would be better if they just couldn’t post a profile at TL0, as being able to to so will just encourage them to do so, and presumably those who get to TL1 will have their profile visible.

codinghorror · June 14, 2015, 12:04am

No. As remarked several times earlier in this topic, you want them to
expose themselves, not lie in wait for a year before unleashing a payload.
In Discourse, they have to actually do the work and read the site to get
out of the sandbox, simply existing does not get anyone out of the new user
sandbox, ever.

Topic		Replies	Views
Lots of Spam New User Registrations? Support	43	5637	April 30, 2024
Dealing with unwanted (and probably spam) accounts via SSO? Feature sso , wordpress , discourseconnect	36	8781	October 16, 2022
Diagnosing spam attack of 100 topics Feature	34	2892	May 29, 2017
Questions about moving an existing forum to Discourse Community	23	2896	November 23, 2021
Our forum is getting "bamwar" spam Support	35	11198	April 1, 2016

Only allow user profiles for TL1 and above?

Related topics