Dealing with persistent Korean human spammers

(Soumith Chintala) #1

Right now, the only way admins can specify manual spam classifiers is using case-insensitive regex.

However, this has proved to be insufficient to me on PyTorch forums that I’ve been running. I’ve also enabled Akismet, but it isn’t doing a great job of filtering the kind of spam I’m seeing.

To give an example, this is an example spam entry:

슈퍼카지노- (₮【79SCV.CoM】₮) -SUPER카지노슈퍼카지노- (₮【79SCV.CoM】₮) -SUPER카지노

The website and posts keep changing, but there are some clear patterns that repeat such as CoM.
If there was an inline code snippet that I could write in admin/site_settings/category/spam, I think I can get rid of all such spam. It would be so much more convenient to filter spam with advanced filters.

Any thoughts on this?

(Jeff Atwood) #2

Have you changed default trust level settings or any other new user rate limits? This looks like bamwar to me, and our built in fast typist check generally stops that sort of thing. Plus you have Akismet. So I’d like to hear more about any non-default security or trust level site settings you might have changed, if any?

(Soumith Chintala) #3

I haven’t changed any of the defaults for trust level settings, or new user rate limits. Occasionally, some of the spam does get caught by the fast typist filter, but it’s only occasional. I think the spammers figured out something around this.

I’ve changed the spam filter settings to be much more aggressive yesterday. By default it waits for 3 flags for blocking stuff, and I made it to 1.

Everyday, this bamwar spam comes in about 12 posts, all within a 1 hour timeframe. Each user posts 3 times.

The bamwar spam stopped coming in for about 2 months, when I enabled all new user posts to be moderated (and then I’d reject all the bamwar, but I also end up with having to approve all legitimate posts). The spammers noticed this after a couple of days and stopped spamming. They’ve restarted spamming again since last week.

This manual moderation wasn’t sustainable, as I was getting 100 legit posts / day in traffic, and I was holding discussions back, so I reset back to defaults.

(Jeff Atwood) #4

You are confident Akismet is configured and working?

(Soumith Chintala) #5

My confidence in it’s working is because occasionally Akismet asks me to confirm that something’s spam. So posts are going through it and it’s classifying them. But it doesn’t classify the bamwar spam, it classifies spam of a different kind (lots of nonsensical characters in the post with a title being the URL for some website).

(Felix Freiberger) #6

What do you do when you find spam posts on your site? Do you delete them or flag as spam and Take action? I think that only the latter sends the post to Akismet as an example of spam – doing this a few times should teach it to recognize this. :slight_smile:

(Sam Saffron) #7

What I recommend doing here is just adding:

auto block first post regex : \p{Hangul}{3}

If a first post contains more than 3 korean letters, chuck it in the require approval queue.

(Soumith Chintala) #8

@fefrei I flag the post, but there’s an immediate option that shows up to Delete Spammer and ban IP (also deletes all posts by the user). This is convenient because each user usually posts thrice, and this is one button that kills three birds. I suspect this does not report the posts to Akismet, I’ll do the longer way of sending the post to the spam queue via Flag Post. Thanks for the heads-up.

@sam that’s a super-cool trick, doing that right away! thank you.

@fefrei here’s some screenshots on what I exactly do:

(Felix Freiberger) #9

Too bad, looks like this is the correct way to go. I find it a bit weird that Akismet didn’t get that – but the RegExp from @sam should help a lot :slight_smile:

(Soumith Chintala) #10

just wanted to follow up here. @sam’s trick works really really great. Thanks @sam.