What about the spam problem?

faq-material

(Lucas Nicodemus) #3

I’ll keep my eye out then - I’m glad you’re aware of the glaring problems with blanket spam protection.

Does normal registration (i.e. without OAuth) require captcha solving or something similar? If so, why isn’t that enabled for OAuth based sign-up?


(Jeff Atwood) #4

It’s also very unlikely that spammers will be able to:

  • adapt to a brand new type of forum very quickly, XenForo has been around since March 2011

  • adapt to a JavaScript app based forum approach versus traditional HTML

At least, not quickly. But yeah, remember Sam and I worked at Stack Exchange, where completely anonymous users could post at any time. So we’re pretty good at dealing with spammers and bots.


(F. Randall Farmer) #5

From your blog: Throttles, IP bans and so on should be your last line of defense

Amen, brother.


(Alex) #6

CasperJS makes it dead easy to spam JavaScript apps.

http://pastebin.com/uNVyEhE1

Is there a list of the anti-spam measures discourse has already in place or is planned to include?


(Jeff Atwood) #7

If it’s so dead easy, can you demonstrate it for us? :smile:


(Jeff Atwood) #8

So far here is what we have:

  • new users are sandboxed in a few ways, notably they cannot post images, and can only have 2 URLs in any given post.

  • posting the same root URL over and over as a new user will lead to auto-hiding of all their posts with that URL, block of future posts with the same root URL, and a PM generated to them

  • if (x) new user posts are flagged by (y) unique users, all their posts are hidden, a PM generated to them, and they are prevented from posting

  • if an individual post reaches the community flagging threshold, it is hidden and a PM generated to the user. An edit will un-hide the post. Read more about flagging.

  • if the moderator deletes the spam user via the “delete spammer” button available from clicking flag, spam on one of their posts the email address is blacklisted and will not be accepted for new accounts again. (new-ish)

  • if a topic is started by a new user, and a different new user with the same IP address replies to that topic, both posts are automatically flagged as spam (new)

  • accounts created in the last 24 hours at trust level 0 can only create a maximum of 5 topics and 10 replies (new)

  • new accounts are strongly rate limited in creating new replies and new topics, e.g. only one new topic every few minutes.

  • new users have their posts fed to Akismet very rapidly in a priority queue to vet them as spam or not. (Requires Akismet plugin and key, but all our hosting has this on by default.)

  • new users who “type” extremely fast are considered highly suspect. So new posts by new users that have virtually zero think / type time on their posts are going to get scrutinized very, very closely.

Some more info on user trust levels.

With these rules in place we have not seen much spam on our partner sites at How-To Geek and Boing Boing, which are quite active. We also don’t see a ton of spam on other Discourse forums that I know of.


(Bcguy) #9

This is a problem we had also on our last forums. Even though these registrations didn’t actually post anything in the forums - they created a ton of user registrations and resulted a ton of new entries into the user database which really distorted user information.

While I don’t think there is anything in the Discourse code right now to deal with this - I want to flag this issue as an important one to probably plan to address at some point in the future.


(Sam Saffron) #10

Long term we should probably flag a large chunk of registrations from a small ip block. We really need to get some of these problems to actually solve them.


Last IP vs. Registration and Posting IP
(Cameron Martin) #11

The proof of work idea seems interesting. You wouldn’t even have to wait for WebCL support in browsers, since bcrypt could be used as the hashing algorithm, which doesn’t see the ridiculous increase of speed on GPU hardware that, say, the SHA-* hashing functions do.

I can’t imagine that you’d need a very large proof of work to make spamming unprofitable (this would have to be looked into). And most people probably spend 10+ seconds to write a reply, so the user wouldn’t even notice it being calculated on a web worker in the background.

Edit: This would be similar to Hashcash, which uses a proof of work to try and solve email spam. See the presentation Proof of Work - Proves not to Work, particularly the Economic Analysis sections. Also here is the original paper.


(Sam Saffron) #12

Keep in mind, we have not seen any extensive untreated spam problem on any Discourse instance yet, Existing tooling seems to work fine for now.


(Rahul Bansal) #13

I just want to say that, our new discourse forum started receiving spam today. :frowning:

Below are screenshots:

Both accounts were registered today. I caught first spam right away, deleted post + user, banned IP + Email.

Few hours later second one came.

I think I will be seeing these spammers more often. :expressionless:

Post contains single link and entire message is hyperlinked.


(Jeff Atwood) #14

Looks like manual human spam.

We have plans to add signup time checks and content (Akismet style) post time checks sometime in the future. That is in addition to what is listed above.


(cpradio) #15

Can we stop TL 0 from posting only a link? As it looks like both examples are posts/topics that ONLY contain a link.


(Jeff Atwood) #16

You could, but that is a really severe limitation. Some people never make it out of TL0.


(InsaneMosquito) #18

But, should someone with less than 10 minutes of time on the board only be posting a link?


(cpradio) #19

Right. My goal isn’t to stop them from post a link, but that their entire post content can’t be one long link.

We already limit them on the number of links allowed, to add a rule stating: “You can’t only post links, you must have actual content” doesn’t seem like a bad idea to me.

I also realize that most of the issues around spam are already being dealt with, but this is one place where a change would help continue the battle against spam in Discourse forums.


(Jens Maier) #20

Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed diam nonumy eirmod tempor invidunt is so easy to copy & paste below a spam link. And you can always hide that workaround-content in a HTML comment…


(Zane Beckman) #21

I think it’s sensible in a lot of forums to disallow new users (trust level 0) the ability to post hyperlinks.

Another thing that could be done would be to allow them to post the hyperlink, but make it unclickable in the front end, or unclickable without first encountering a warning. (Are you sure you want to click this link? Here’s where the link points to: http://spammyspam.com/spam)

It would be imperative that the link appear valid to the alleged spammer themselves, so they believe it’s worked and they move on.


(cpradio) #22

Although that is true, I’ve found most spammers are lazy, so the harder you make it, for them to link drop the better. That is what they are there for, to copy and paste an already written paragraph into another outlet. If they can’t do it quickly they either post broken junk, or move on to the next one.

Most won’t go out of their way to find Lorem and it won’t help their “SEO” efforts in using that text.


(cpradio) #23

I agree. I can’t see many reasons for a new user (even running a tech forum) to need to post a link. Most of the time, we remove links from posts/topics because they are unnecessary.