Only allow user profiles for TL1 and above?

I realise that, but it doesn’t take a great deal of time and effort to reach TL1. I don’t think the majority of genuine members would be put off by having to wait. Indeed, I think most genuine members sign up because they want to post, and by the time they’ve done what they wanted to do and read a couple of topics, they’ve reached TL1 without even noticing, before even thinking about updating their profiles.

2 Likes

As you can see here

Very clearly, our users were experiencing a significant “Aha!” moment as they personalised Ghost and turned it into their own, custom space on the internet.

There is a lot of data to support the idea that customizing your profile to make it “yours” contributes significantly to whether or not someone bothers to stick around at all.

So I don’t support what you are proposing; the data (and my experience) tells me that would be a bad choice.

I may be missing the point here, but that article didn’t say those who customized their profile stuck around. It said, those who created a post (we agree with this one – this will lead them to TL 1 quickly, as they are engaging the community), uploaded a custom theme (okay, maybe this comes close… but we are comparing apples to oranges a bit – I wouldn’t call a user’s profile a theme), or adding a custom domain (no comparison for this in Discourse).

The biggest difference here is ghost is a blogging platform, the three things mentioned help the user identify their blog. Discourse is a discussion forum, the three things we should want are posting/creating topics, liking posts, and viewing topics/posts. Those show engaged users, not a custom profile page that you have to click to find… (just my opinion here)

We aren’t saying they can’t have a profile, just that at TL 0, the need for it is far smaller than TL 1. I’d be interested to see stats within Discourse about this. How many users customize their profile that are TL 0 versus TL 1. How many of those TL 0 profiles are legitimate?

I’m fairly certain that most TL 0 profiles are spam/a want to self-promote/advertise. They serve little use to the community and are not necessarily members the community would care to keep. That is what our data has been showing us at Sitepoint (for what it’s worth).

3 Likes

Well I owe you an apology here because I checked the new user admin page on a few popular sites we host and… this is a much bigger problem than I realized.

It’s quite bad and looks 100% human entered, so captchas won’t change a thing. :unamused:

So hopefully by the end of this week, I’d like @zogstrip to work on the following:

  • a new site setting – maximum new user accounts per registration IP. Defaults to 3. If there are already (n) trust level 0 accounts from this IP, stop accepting new signups from that IP. If there are TL1 or above accounts, they will not be counted toward this limit. The user dialog should reject with the existing vague screened email/IP login dialog message copy. This part is urgent, do this first!

  • improved admin IP lookup dialog – the “other accounts with this address” should show some basic stats in a scrollable div: total count of other accounts from this IP, and the username / read time / topics entered and trust level for each account. Right now it just shows the tiny avatar which isn’t enough to judge anything.

  • one-click staff user delete button on the user profile page for TL0 users with 1 post or less. Something I gave @neil but @zogstrip, you should take it as part of this work.

  • some way of batch deleting a number of accounts from the same IP address. Not sure exactly where this should go in the UI… but there’s some cleanup needed here.

  • (for later consideration) a cleanup task that deletes old accounts that have no posts, no actions, no read time, and no subsequent visits after (n) days. We’ll have to watch out for SSO and other oddball “inactive” user scenarios here.

None of this is really hurting anything as the whole /users/ path has always been disallowed in robots.txt for a long time, and TL0 users don’t have any user clickable links on their profile by design. But it’s still a lot of bogus accounts being created, which is annoying and messy.

11 Likes

I welcome any changes that may help prevent these from being created in mass hordes :smile: They are a bit tricky to find (that’s a plus to Discourse – if we can’t find them easily, I can’t imagine they are doing a lot of good for the person who created them), and when we stumble on them, we take care of them.

I’m also relieved we are not the only ones with this problem, that’s good to know.

1 Like

I think a safe initial default is “bypass this check altogether if any users on that IP are trust level 1 or above”

That’s probably true 99% of the time, so yep, I think that would be safe. The odds that someone who is there for “spam/advertising” actually makes it to TL 1 would be unlikely. I really don’t know if we have any that are TL 1 with the same underlying IP that is primarily TL 0, but I’m also not overly concerned about that “edge case”

I think it’d be safer if we only bypass if there are any TL2 or higher accounts at that IP.

7 Likes

Do you happen to have anything handy other than the Ghost article you linked to? That data clearly shows a correlation, but the causation arrow was left unexplored. There would have to be a followup showing that because they induced more people to fill out their profile (via the video they talk about or whatever) they ended up with an overall increase in subscriptions. They didn’t go that far.

In other words, is it more a matter of “I feel invested in the community because I filled out the profile” and thus you stick around or “I feel like investing in this community so I will fill out the profile.” If it is mostly the former, then the Ghost data should be viewed as a selection bias.


I run a reasonably large vBulletin (yeah, I know…) site and the spam tools some have written targeting vBulletin are more complex that you might guess. We’ve seen accounts registered that didn’t actually do anything for more than a year (other than regular monthly logins to show activity) before the spamming started, either by posting or profile spam. Another common variant is a human registration, followed by 1 or 2 relatively content-free posts (“this is great!” and the like) then followed by that long dormant period and then later being mobilized to do spamming. And, no these are not hacked accounts. We’ve looked and the foreign IP ranges used from registration through spamming a year later are consistent. I suspect some spam companies “register-now-sell-spam-services-later” to avoid tripping some of the new registration alarm admins might have set.

Anyway, just throwing it out I’d still like the option to be able to disable profiles for TL0.

2 Likes

I added a Delete button to the user profile page today:

6 Likes

That part is now in :crocodile:

4 Likes

@zogstrip does this include the tweak suggested by @sam, where if there is at least one TL2+ user from that IP, the number of registrations is unlimited?

e.g. only limit new registrations from that IP if all the users from that IP are TL1 or TL0.

2 Likes

That part is now in too :monkey:


That wasn’t in the initial implementation but it’s now in :snail:

6 Likes

Mailing list lurkers may fall into this “inactive” category

3 Likes

OK, after cleaning up a ton of this profile spamming nonsense by hand to get a feel for it, I am not sure we should even offer the flag to disable profiles for TL0 users. Here’s why.

FIRST! Remember that

  1. robots.txt excludes /users/ out of box
  2. no hyperlinks are ever generated on a user profile page for TL0 users
  3. we don’t even have any user index page, so good luck finding a user who has never posted

So it’s not that dangerous to have profile spammers, it’s just noisy and kind of annoying.

My worry is that this class of spammer only expose themselves as a spammer when they enter spammy profile info. So ask yourself this: which of these outcomes is worse?

  • a bunch of new mysterious users sign up from India but don’t ever post or fill out their user profile info, because they can’t.

  • a bunch of new users sign up from India and don’t ever post, but they enter spammy profile info (that is safe for the reasons listed above) that definitively exposes them as spammers.

Knowing what I know about people wanting to “grow” their sites, they’d be perfectly happy to keep these mysterious non-posting, non-profile-updating spammy accounts around as evidence that “hey my site is working look at all these user account signups!” when they’ve let some evil people in their house, and simply don’t realize it yet.

So I posit that there is value in letting these spammers reveal themselves as profile page spammers. And there is significant danger in suppressing the profile edits for TL0 because it will prevent eager (or busy, inattentive, etc) Discourse operators from discovering that their influx of new users is actually just a bunch of super low value spammers who weren’t able to post anything… yet.

5 Likes

Especially once we have an akismet plugin that can automatically flag these profiles. Willing to bet very few false positives and negatives akismet side for these kind of blobs of text.

If you’re saying that Profile content will soon be searchable for Moderators, I’m 110%+++ in favor of that and can’t wait :thumbsup:

As much as I’d rather not have the Profile acting as a honeypot for finding the ~20-25% of new accounts that cause this clutter, the more criteria we can search and compare the easier making moderator decisions will be.

1 Like

Some more thoughts on this after cleaning up a lot of it. What are the strongest signals in a spam profile?

  • the IP originates from an unusual, non-English country: Bangladesh, India, etc

  • the user has read 0 topics and 0 posts, yet has a profile filled out

  • the username ends in a number, e.g. “frexawe56”

We’re thinking that eventually:

  • certain user signups will have enough risk associated with them (username ending in a number, from a strange country, profile filled out despite reading zero topics, TOR node entry) that the signup will have to be manually approved. This will be a fair bit of work since right now user signup approvals are global on or off.

  • user profile about me can be fed to some third party bayesian filter like Akismet for flagging as needed

  • we can have a “suspect users” tab on the users admin section which can show these kinds of users

Also note that @zogstrip is adding a “delete all accounts from this IP” option to the IP lookup panel, that’s necessary for cleanup if nothing else.

4 Likes

I like the direction you are thinking. I believe everything you’ve stated apply to our setup as well. I’d be really happy to have a suspicious users tab that we could work from. Then see if that tab fills up quickly enough to warrant signup changes.

1 Like

@codinghorror I’ve been in the trenches on this for a while so I have a strong opinion. (In my case, we haven’t had new spam for a long time now, but I still have the scars from the previous battles. :-))

While not meaningless, isn’t a deterrent to many spam operations. They may not guarantee any listings increase. Some just cite authorities as to why incoming links are “good” and then provide proof (a list or URLs) to you, the client, that you’ve got now got a bunch of incoming links from reputable sites. Remember, we’re talking about shady operations here and there is a wide range in the “quality” of the spamming operation. That they may not deliver benefit to their client probably doesn’t eat at their conscience.

This is useful. More so than the previous because the proof-of-work looks shoddier on pages without a real hyperlink.

The page exists, so the proof-of-work can still be provided. Also, as mentioned earlier, some (not many admittedly) of the spammers will have the human spammer do a low-quality reply or two somewhere in the forum (“I agree” / “Are you going to do follow up on this?” / “I never thought of that”) to ensure there is a discoverable link to the profile (which yes, is robots.txt excluded by default in Discourse’s case). If the system allows signatures (no with Discourse), they might also later add signature spam when the thread has fewer eyes on it.

So the “no hyperlinks think” is great, IMO. If it didn’t exist, I’d post the following without reservation. It does exist so it makes what I’m about to say less strong, but I get the sense that no real hyperlink existing may not always be a deterrent, otherwise they wouldn’t even spam the page with the URL, and boom, problem solved. Anyway, on to the opinion:

I am unconvinced by your argument that allowing people to reveal themselves as spammers outweighs the reality that you’ve now got spammy accounts hanging around. The reason I’m unconvinced is that the same spam house doesn’t typically just create one account on a forum never to return. They come back and create more for new clients. Even with the above mitigating factors you mention, Jeff, they’re still able to provide a link to their profile page as proof of work to their clients. If you stop that, they can’t provide that, and you’ve got 1 blank spam account instead of eventually hundreds. Remember, lots of these are human registrations these days–that’s why they’re coming from countries with cheap labor–and humans will know that they can’t fill out a profile and so shouldn’t return. With TL0 not having a profile by default, Discourse sites become dead to them for profile spamming.

I’m sensitive to your argument that letting people fill out a profile right away lets them invest in a community and thus may result in future activity. However, I’m unable to find evidence that a slight delay (to get to TL1) would have a significant negative impact since users are still learning the system anyway and aren’t going to be using all features (including profiles often) right away anyway. Ghost might be able to provide data demonstrating harm, but right now it is just a post hoc ergo propter hoc argument; they’ve noted the correlation, but not demonstrated the causation. I’ve looked around for other evidence that might exist out there to demonstrate the importance of allowing an early profile, but haven’t found it…yet. If good evidence is out there, I withdraw all objections to your TL0 profile default.


As a side note–Google is getting very good at identifying profile spam links as being low quality links and penalizing sites that are linked to from them, at least as part of a larger pattern. If you didn’t robots.txt exclude those profiles or nofollow the links, Discourse operators would start receiving emails asking you to remove some of those profile links because of the penalty. I always get a chuckle when I see those. Here’s an example of one I received recently. I’m not redacting the company name involved here it because I think the company probably did hire a spammer to generate the links in this case. (It isn’t unheard of for someone to hire a spammer to spam with links to a competitor’s site to force a Google penalty for that site):

My name is Cathy and I'm contacting you on behalf of http://www.futondeco.com/. First we want to thank you for linking to our site, as referrals are the lifeblood of our business. But unfortunately Google has identified your site as a possible reason why our rankings have recently dropped.

I believe the following links may be the ones causing the harm: [link to an old profile page containing a spam link to futondeco.com]

If you could spend just a little bit of time removing the links, I would be extremely grateful. Google sure is getting goofier by the month if you ask me. This kind of stuff makes absolutely no sense to me. 

Thank you very much for your time and I look forward to hearing back from you.

Cathy LaClear
(http://www.futondeco.com/)

If you want an eye opening experience, I suggest you go through the motions (but stop short) of hiring one of these low-quality-spam groups. Corresponding with them will give you a better sense of how they operate and think.

3 Likes