Spam account scanner script

https://github.com/TannerFilip/discourse-spam-check

I’ll start off by saying, I’m not a great programmer. This is the first “real” tool I’ve written that’s (potentially) useful to people other than me. I’d love any feedback/criticism you have.

I’ve written a Python script that scans through the list of suspect and/or silenced users and lets you delete them if necessary. I ran it over on Mozilla’s Discourse and deleted a few dozen accounts - this was only after I deleted close to a hundred by hand.

There are a few things that seem pretty hacky, especially lines 174 to 191. As I said, I’d appreciate any feedback you might have, and would be happy to answer any questions!

11 likes

Very cool! One thing you’ll want to do is be sure Akismet is enabled, as we recently (within the last 2-3 months) added a feature where the Akismet plugin will scan new user accounts for spammy stuff and flag them for you thanks to @Roman :clap:

Yes, completely human spam account signups – accounts that never post once, just set up an account with profile info and walk away forever – is indeed still a problem. The below is even after Akismet checking:

But bear in mind user profiles aren’t indexed at all, and new user profiles have seriously suppressed info… and our Akismet change helps tremendously.

Having a cleanup tool is still needed though!

7 likes

I didn’t know that! I’ll have to talk to @LeoMcA to see if we want to enable that.

4 likes

Suspect users are now being sent to the Review Queue, which removed the suspect users list this script was using. As they’re being pushed to manual review, is this needed now?

3 likes

Has there been any progress on this?

Our community is experiencing several spam/bot account signups per day that have 0 posts read, 0 topics viewed, less than 1min read time. It would be good to have an auto-remove function for all accounts with certain selected parameters.

Also, is there an option for a Captcha or similar plugin to help filter bots?

If those accounts have no activity, they’re harmless. They are invisible to other users (including a public user list). And user profiles, regardless of their trust level, are disallowed in robots.txt and not visible in search engines.

Plus, inactive accounts are periodically cleaned up, see Clean up inactive users after days setting (“Number of days before an inactive user (trust level 0 without any posts) is removed. To disable clean up set to 0.”).

It’s triggered by the CleanUpInactiveUsers SideKiq job.

1 like

That disallows nothing. robots.txt is only a pollite suggestion, that at same time points to right direction.

Het is misschien onschuldig, maar in het verleden hebben spammers deze accounts gebruikt om hun profiel te “verouderen” voordat ze het activeerden, wetende dat we nieuwe accounts in de gaten houden. Dan probeert plotseling een account van 3 maanden geleden te linken naar wat spam of DM-gebruikers phishingpogingen.

Persoonlijk zou ik graag betere tools willen hebben om die te voorkomen voordat ze een probleem worden, in plaats van te wachten. Het zou ook helpen als we sterkere tools hadden om te voorkomen dat bots zich überhaupt aanmelden.

Zeker, het kan nog steeds soms een probleem zijn. Ik ervaar veel spam, maar tot nu toe heb ik nog geen spamaccounts gezien die plotseling na lange tijd posten.

Als ze spam zouden plaatsen, zouden ze toch snel door andere gebruikers worden gemarkeerd.

En je kunt de duur waarna een inactief account wordt verwijderd nog steeds drastisch verkorten.