Blocking recent wave of spam

We are hit by a massive spam wave for days now. Others like https://ask.learncbse.in/ gave up for the moment as it seems :thinking:

I’m searching here just for one variant:

The content changes often, the email addresses and IPs as well so blocking does reduce the amount but we didn’t find a real fix yet. For privacy reasons we do not want to send everything to Akisment.

If we would block

AS55836: Reliance Jio Infocomm Limited
AS9498: Bharti Airtel Ltd.
AS45609: Bharti Airtel Ltd.
AS24560: Bharti Airtel Ltd.

we would be fine, but this could be a good (or small) part of the Indian population.

3 Likes

Have you tried adding certain words to Admin → Customize → Watched Words → Require Approval?

From your screenshot, I’d try adding these words:

  • cash
  • credit
  • money
  • loan
  • toll-free
  • customer care
  • care number
  • 0779*
  • helpline
  • :point_left:

It can be slightly inconvenient for users, but I have Discourse send a webhook to a Firebase cloud function (free) that pings my phone in a Slack chat room, so I can often approve posts in moderation within 60 seconds from my phone, if I’m awake.

3 Likes

Thanks for the hint but please check out https://ask.learncbse.in/ (it’s not my instance but the posts are more or less the same I’m fighting against) and scroll through the last days, they are using a ton of combinations and variations of each keyword. I’m in the process of creating a lot of regex for each keyword because they are adding everywhere a “.”, a “,” a “|”, replacing a “0” with an “O”, an “e” with a “3”, adding in the middle of the word a (up to now) random character etc etc … it is really difficult to fight against this type of spam.

Or if you do not want to click on a random link, here is a screenshot of the last few hours, but these are just the last few hours, they vary a lot over time:

1 Like

Just to check, but do you use the min first post typing time admin setting? I find that quite useful for catching a lot of ours.

4 Likes

Yes, thanks for the hint, this is set but I mean it is not that hard for the bot to just wait a few minutes :slight_smile:

1 Like

This spam seems like a different type to the AI based answers/content the other topic is focused on so I’ve split it out. :+1:

We do have a new AI-based tool for spam detecting which has proven to be quite effective:

5 Likes

Thanks for the tip, but setting up an LLM just to fight another spammer LLM for our discourse is way too expensive for our usecase.

As a spammer you can easily increase the cost for the org by just creating more users/posts, so depending on what you want to archive this could be also a motivation to create even more posts :slight_smile:

1 Like

Hi,

Have you tried using Akismet? Seems like their solution would work for you.

(free for personal use, not for commercial use - don’t know how you’d categorise yourself)

2 Likes

Perhaps requiring every user’s first post to be approved would help a bit here? That way at least they’d never make it onto the forum publicly, and as long as you don’t have a lot of real users signing up daily, I think it would help at least some.

5 Likes

Thanks for all the tips.

We do thought about it but we have a privacy and security product which means we do need to protect our users as much as possible. The content is public for sure but not the IP Address/Agent/Referrer/Email if I understood Discourse Akismet correctly, it is transmitted to Akismet (sure would also read the privacy policy but the overview is already enough information for the decision).

That would be an idea. With ~2 signups per day it shouldn’t be too much trouble, but it’s not the best experience to wait for an approval, but if we explain it properly it might be the best option we have for now.

1 Like

Yes, you are unfortunately correct - they do transmit some additional data to Akismet which may not align with your privacy policy. In that case, @Firepup650’s suggestion is the best one out there.

1 Like

FYI my Geo Blocking plugin can deny access to Discourse based on the source AS network. Indeed a lot of this kind of spam seems to originate from those networks, especially AS45609.

If you don’t want to block half of India then it might be worth investigating how hard it would be to reuse some of the functionality in that plugin to add network or IP based rules to the approval logic (“require approval for new posts from networks”)

6 Likes

I scrolled through many pages on that example site and think it might be possible to block nearly all of those with the watched words feature, if Discourse regex can work on Unicode ranges.

Regular users probably don’t use things like this:

  • 2+ slashes in a row
  • unusual punctuation like ^ (unless it’s a math site)
  • uncommon Unicode ranges:
    • ✓ (Miscellaneous Symbols)
    • ∆ (Greek and Coptic)
    • ❽, ➁, ❸, 3, ❷ (Dingbats)
    • 𝘾, 𝙪, 𝙨, 𝙩 (Mathematical Alphanumeric Symbols)

ChatGPT could probably write a regex for those, if Discourse supports it.

One more idea is to try Cloudflare with the Bot Fight Mode feature (free) and challenge all bots.

3 Likes

Ouh, that would be the perfect solution, will have a look into the code, thanks!

The problem here is that this bot somehow knows how Discourse works: In the following scenario I’m watching for ❽ in the “Require for Approval” section. The problem is now that those bots often create first an random text and then edit it to the actual content. Editing a post is not checked against the “Require for Approval” list, see e.g.

VS

(here I added the ❽ directly during post creation)

which means our only option is to add it to the block section, but blocking too many words and characters can easily lead to problems where normal users get a confusing message when creating valid posts. I think this is where most of our problems come from. In my opinion, this is a bug, and also when editing a post, the “Require Approval” list should be checked against the edited content when the change is published.

2 Likes

I guess watched words won’t help then. I haven’t had a spam attack from that yet but I’m worried about it because users started to figure it out.

3 Likes

It looks like one of my forums just got hit by that same kind of spam attack. I don’t know if they used the editing trick, since I didn’t have the spam words on the watched words list yet.

2 Likes