The content changes often, the email addresses and IPs as well so blocking does reduce the amount but we didn’t find a real fix yet. For privacy reasons we do not want to send everything to Akisment.
Have you tried adding certain words to Admin → Customize → Watched Words → Require Approval?
From your screenshot, I’d try adding these words:
cash
credit
money
loan
toll-free
customer care
care number
0779*
helpline
:point_left:
It can be slightly inconvenient for users, but I have Discourse send a webhook to a Firebase cloud function (free) that pings my phone in a Slack chat room, so I can often approve posts in moderation within 60 seconds from my phone, if I’m awake.
Thanks for the hint but please check out https://ask.learncbse.in/ (it’s not my instance but the posts are more or less the same I’m fighting against) and scroll through the last days, they are using a ton of combinations and variations of each keyword. I’m in the process of creating a lot of regex for each keyword because they are adding everywhere a “.”, a “,” a “|”, replacing a “0” with an “O”, an “e” with a “3”, adding in the middle of the word a (up to now) random character etc etc … it is really difficult to fight against this type of spam.
Or if you do not want to click on a random link, here is a screenshot of the last few hours, but these are just the last few hours, they vary a lot over time:
Thanks for the tip, but setting up an LLM just to fight another spammer LLM for our discourse is way too expensive for our usecase.
As a spammer you can easily increase the cost for the org by just creating more users/posts, so depending on what you want to archive this could be also a motivation to create even more posts
Perhaps requiring every user’s first post to be approved would help a bit here? That way at least they’d never make it onto the forum publicly, and as long as you don’t have a lot of real users signing up daily, I think it would help at least some.
We do thought about it but we have a privacy and security product which means we do need to protect our users as much as possible. The content is public for sure but not the IP Address/Agent/Referrer/Email if I understood Discourse Akismet correctly, it is transmitted to Akismet (sure would also read the privacy policy but the overview is already enough information for the decision).
That would be an idea. With ~2 signups per day it shouldn’t be too much trouble, but it’s not the best experience to wait for an approval, but if we explain it properly it might be the best option we have for now.
Yes, you are unfortunately correct - they do transmit some additional data to Akismet which may not align with your privacy policy. In that case, @Firepup650’s suggestion is the best one out there.
FYI my Geo Blocking plugin can deny access to Discourse based on the source AS network. Indeed a lot of this kind of spam seems to originate from those networks, especially AS45609.
If you don’t want to block half of India then it might be worth investigating how hard it would be to reuse some of the functionality in that plugin to add network or IP based rules to the approval logic (“require approval for new posts from networks”)
I scrolled through many pages on that example site and think it might be possible to block nearly all of those with the watched words feature, if Discourse regex can work on Unicode ranges.
Regular users probably don’t use things like this:
2+ slashes in a row
unusual punctuation like ^ (unless it’s a math site)
uncommon Unicode ranges:
✓ (Miscellaneous Symbols)
∆ (Greek and Coptic)
❽, ➁, ❸, 3, ❷ (Dingbats)
𝘾, 𝙪, 𝙨, 𝙩 (Mathematical Alphanumeric Symbols)
ChatGPT could probably write a regex for those, if Discourse supports it.
One more idea is to try Cloudflare with the Bot Fight Mode feature (free) and challenge all bots.
Ouh, that would be the perfect solution, will have a look into the code, thanks!
The problem here is that this bot somehow knows how Discourse works: In the following scenario I’m watching for ❽ in the “Require for Approval” section. The problem is now that those bots often create first an random text and then edit it to the actual content. Editing a post is not checked against the “Require for Approval” list, see e.g.
(here I added the ❽ directly during post creation)
which means our only option is to add it to the block section, but blocking too many words and characters can easily lead to problems where normal users get a confusing message when creating valid posts. I think this is where most of our problems come from. In my opinion, this is a bug, and also when editing a post, the “Require Approval” list should be checked against the edited content when the change is published.
It looks like one of my forums just got hit by that same kind of spam attack. I don’t know if they used the editing trick, since I didn’t have the spam words on the watched words list yet.