Some Ideas for Spam Control

faq-material

(David Kobia) #1

I’m already getting plenty of spam on a discourse install. Based on experience with running a vB forum with over 100K members, I thought I’d give a few thoughts…

  • Current spam control is slightly naive. It doesn’t scale very well when you get into thousands of members. Spammers will often have a whole bunch of accounts, and usually have the time and effort to keep on creating new accounts to spam, even if it’s only once per account.
  • A hook to stopforumspam.com and akismet need to be built in by default. They don’t catch everything, but at least they reduce the spam. Its easier to deal with 5 spam posts than 500.
  • Spammers originate in a few countries - Vietnam, Pakistan, China, India, Nigeria, Morrocco, etc. That said, it’d be nice to be able to block/moderate/approve posts from specific IP’s. Hell, it’d be nice to just be able to ban an ip address. A more advanced ip blocker would allow you to block an ip address range.
  • I get that the forum is javascript, which prevents canned/bot type spam… but like I said before, some spammers have nothing but time to do this over and over every single day.
  • I’m thinking of building GeoIP into my nginx to just block specific countries. To not be too overly heavy handed, I might just block them from posting – but they’ll be able to view posts. This is much, but I’d rather not deal with spam (sorry china). I’ll post my nginx (or apache) config for this when I get it done. This reduced the spam on my vB forum quite significantly.
  • A really good method I’ve found to work is just ‘muting’ the offenders. This way they can post all they want but no one ever sees their posts except the administrator. This prevents them from creating new accounts. No one wants to play the cat-and-mouse game.
  • One needs to really engage the community to help with flagging – much of the time though, community members would rather email to complain than click the flag button.

Some of this stuff might already be built in – but I just couldn’t find it.


Our forum is getting "bamwar" spam
(Bill Ayakatubby) #2

Actually, I disagree. Once Discourse becomes ubiquitous (and I believe it will), bots will come along that spam directly through the Discourse API.


(Jeff Atwood) #4

So far here is what we have:

  • new users are sandboxed in a few ways, notably they cannot post images, and can only have 2 URLs in any given post.

  • posting the same root URL over and over as a new user will lead to auto-hiding of all their posts with that URL, block of future posts with the same root URL, and a PM generated to them

  • if (x) new user posts are flagged by (y) unique users, all their posts are hidden, a PM generated to them, and they are prevented from posting

  • if an individual post reaches the community flagging threshold, it is hidden and a PM generated to the user. An edit will un-hide the post. Read more about flagging.

  • if the moderator deletes the spam user via the “delete spammer” button available from clicking “flag, spam” on one of their posts, both the email address and IP address are blacklisted and will not be accepted for new accounts again.

  • if a topic is started by a new user, and a different new user with the same IP address replies to that topic, both posts are automatically flagged as spam

  • accounts created in the last 24 hours can only create a maximum of 5 topics and 10 replies.

  • accounts created in the last 24 hours can only create new topics every 60 seconds and new replies every 30 seconds.

  • deleted spammers automatically blacklist the email and IP used. Emails are fuzzy matched.

  • you can temporarily disable all new account registration as needed via allow_user_registrations.

  • Trust level 3 users can hide spam with a single flag, versus the three (default setting) flags that are usually required. Read more about user trust levels.

With these rules in place we have not seen much spam on our partner sites at How-To Geek and Boing Boing, which are quite active. We also don’t see a ton of spam on other Discourse forums that I know of.

I am curious what is different about your Discourse forum community that you are seeing so much spam. It is certainly true that if you have a userbase that is unwilling or unable to flag spam, your life will be much harder. Can you provide a link to it?

Some possible suggestions:

  • change the number of URLs a new user can have in a post from 2 to zero via your admin settings
  • change all user accounts to manually approve; if you are getting hundreds of human spammer account signups a day, this might be the only way.

Tools to permanently remove or delete spam?
Pre approval email in the dashboard waiting to go
Stop Forum Spam - Spam Countermeasures
(Erlend Sogge Heggen) #5

How about until this user has leveled up, all their links will prompt a “this user has not yet been verified, are you sure you want to open this link?” warning, similar to the “This points to an external site, are you sure…” that you see on a lot of social networks.

Does this work against shortened links (e.g. bit.ly/f7shfd) as well?


(Jeff Atwood) #6

That sounds incredibly annoying. Why would anyone want that?

Why wouldn’t it?


(Bill Ayakatubby) #7

I think what @erlend_sh meant was this: If a new user posts bit.ly/abc and then posts goo.gl/xyz, but both of them resolve to example.com, will that still count as posting the same root URL?

Or maybe you already inferred that and I’m just being incredibly specific for no reason. :smiley:


(Jeff Atwood) #8

Oh, I see… that makes more sense. No, we don’t expand shortened URLs to match them, though that is an interesting (albeit expensive in time) thing to consider.


(Brentley Jones) #9

How does that work when multiple users have the same external IP? (Roommates for example)


(Jeff Atwood) #10

That was covered in the original topic, feel free to read through it.


(David Kobia) #11

A lot of the controls @codinghorror mentions actually seem to work. I’ll post an update in a few weeks. Unlike many of the forums I’ve worked with in the past, phpBB, vB, Vanilla etc, there’s plenty of incentive for people to engage in ‘likes’, ‘flags’ and there actually be some kind of effect (feedback loop). There’s not really a reward for such actions on many of these well established forums. I’d like to see the ‘trust/scoring’ mechanism advance.

Moderating super-active forums is a pain in the rear. Some of the self-moderation levers built into discourse will go a long way.

@codinghorror I can feel bits of the stackoverflow point system growing in here? I’m a fan.


(Jeff Atwood) #12

We are definitely adding more IP screening for signups in particular.

It is an oversight that we offer “screened Emails” and “screened Urls” (though we do nothing with the URLs yet) but not “screened IPs”.

@Neil will be working on that, so burned spammers get added to the screened IP lists and no new users from that IP (or range) will be allowed to sign up. And you can add/delete from that screened IP list too. edit: this is done.


(Jeff Atwood) #13

@dkobia how did things go with the spammers?

We’ve seen one forum where human entered spammers are fairly persistent – and we’re looking at an automatic rollup type function that decides

Hey it looks like you deleted spammers from

  • 172.16.1.12
  • 172.16.1.99
  • 172.16.1.34
  • 172.16.1.203
  • 172.16.1.185

So we’re going to go ahead and ban the 172.16.1.* range from signing up now.

Where the threshold is say, 5 spammers in a given IP range.


(Kane York) #14

I must say, it felt pretty cool on the How-To Geek forums a day ago. I saw 8 new topics in a row and thought - “Hey, wait a minute - that looks like spam!” I went into one of the topics, confirmed that assesment, flagged it, then went back to the homepage - “Wait, what happened? There’s only 4 in a row - I guess that means I hit the autoblock! Cool!” and did it for the other user.

Taking part and watching the spam disappear is a pretty powerful emotional incentive to want to do it again.


(F. Randall Farmer) #15

From

… in about the power of auto-hiding spam based on flags…

There was little doubt that driving spammers and trolls from the site had a significantly positive effect on the community at large. Again, abuse reporters became very protective of their reputations so that they could instantly take down abusive content. - Case Study: Yahoo! Answers Community Content Moderation [Building Web Reputation Systems]


(Liam Austin) #16

Is there a way to identify posts with an email address? If so, how do I do this on my forum?


(Mittineague) #17

I don’t about “posts” but by account is.
This current discussion should be helpful