Spam-blocking URL Blacklist


(Lowell Heddings) #1

This feature might already exist in the code somewhere, but it’s probably worth mentioning as an admin feature.

Over the years on the old HTG forum, we’ve noticed that the same exact spammers come back and manually spam the same links, over and over and over. We’ve also been hit by automatic bots a few times, though I’m less worried about that with Discourse. It’s really the manual spam that’s just tedious and consistent.

It would be useful to have a URL blacklist for spammers. Possible ideas for discussion:

  • Automatically hide and flag posts that have a blacklisted URL, so moderators can check for actual users posting the link for some unknown reason.

  • Automatic and temporary ban if the link appears in the user’s first post or they post the same blacklisted link more than once. Flag alerts the moderators, or not.

The real goal, of course, is to prevent spammers from wasting people’s time, especially the moderators.


(Jeff Atwood) #2

Maybe if a post reaches the spam flag threshold – and is not overturned by a moderator – we could automatically add any URLs in the post to the URL blacklist?


(Lowell Heddings) #3

That could work, assuming they aren’t linking to other things as well. Sometimes the manual spammers will link to a good link as well as a spammy one.

Or… maybe once the URL has been posted in more than one spam flag thresholded post that wasn’t overturned. That way there wouldn’t be false positives from a single spam post, and it would solve a repeat spam problem, rather than a single spam incident problem, which is going to be handled by moderators either way.


(F. Randall Farmer) #4

If there is more than one URL in a post, it should be treated as a distinct object in Flag handling interface. We can’t infer badness in that case.


(Jeff Atwood) #5

Probably we can just increment the counter in the table, and not enforce the blacklist until the counter goes > 0.

And the table should also have a date, so we can clear out old entries that didn’t reach the threshold or that have long since stopped being used by spammers, so the table does not grow forever.


(Patrick Klug) #6

Second the need for a url blacklist: It would be great if we could configure discourse so that postings with links to certain sites are automatically flagged and hidden.


(Kane York) #7

So you basically want the ability to manually add entries to /admin/logs/screened_urls ?


(Patrick Klug) #8

I didn’t realize that screened_urls are a feature already but yes, I would like to manually add some entries.


(Chet McDoniel) #9

I realize I’m resurrecting an old thread here, but I’d like to add my vote for this. This isn’t necessarily just a spam problem. We use Discourse for an internal company discussion space, and we don’t want our people posting links to some sites due to incorrect info on those sites/inappropriate info, etc…it would be easier to control this on a URL level rather than a per person effort.


(Lowell Heddings) #10

This is what I’d love to see. There are certain spammy sites (largely the easily cloned nonsense like video conversion and Outlook PST repair) that have spent years trying to get their link posted somewhere on my site. Whenever I haven’t seen a link from them in a while, I start to wonder, because that usually means they managed to sneak one through somewhere.


(ljpp) #11

Grave digging an old thread. There are a lot of “news sites” which are nothing by click baits. They publish screming headlines of current topics, add no value to the news itself and only quote a phrase or two from the actual source.

I would like to block links to these sites - my users are linking them every now and then.


(Mittineague) #12

There are three "Screened"s - IP, Email and URL

The ScreenedIP and ScreenedEmail can, depending on actions taken against registered accounts and Settings, have an effect with new registrations.

ScreenedURL could still use some love.

I think for now the best approach would be to use the Post Settings

censored words
Words that will be automatically replaced with ■■■■

censored pattern
Regex pattern that will be automatically replaced with ■■■■

AFAIK the “pattern” feature has some safe guards in place to prevent poor regex from wreaking havoc, but just the same, I think unless you know regex well it would be best to avoid using it and stick with using “words”