Various question about cleaning up an imported vbulletin

Hi,
I imported a 20 years old forum to Discourse.
It has many unused or spam accounts and spam messages.
I’d like to do a cleanup.

I set up all my users to trust level 0.
I tried to know how many users have never posted any message with this data explorer query:

SELECT COUNT (DISTINCT user_id) from posts

It counts 28530.

I tried to know how many total users I have:

SELECT COUNT (DISTINCT id) FROM users

It counts 180000 (vBulletin stats said we indeed had 180000 accounts)

I triggered the CleanUpInactiveUsers Sidekiq job.
Only a few hundred users were removed. I looked at one of these unused profiles and saw no activity: no post, no topic… However, Discobot sent them a message after I imported all the users from vBulletin. It was 5 days ago.
So I Discourse settings, I set clean up inactive users after days to 1.
I trigger CleanUpInactiveUsers Sidekiq job again.
I lost about 1000 users.
I still have 178000 users and I know that most of them are empty and unused profiles with no message.

Any idea why they aren’t removed by CleanUpInactiveUsers?

Plus, since there are many spams accounts and messages. Is it possible to trigger spam detection on existing users and messages, and clean all that up too?

The job has a limit of 1000 per run, to avoid clogging the sidekiq queue

https://github.com/discourse/discourse/blob/106a2f58a2169dcd9a76d4ccb6d696a4c2364efc/app/jobs/scheduled/clean_up_inactive_users.rb#L17-L17

You could run it from the rails console like

Jobs::CleanUpInactiveUsers.new.execute({})

And then put it in a loop like

100.times do 
  Jobs::CleanUpInactiveUsers.new.execute({})
  puts "Done iteration. Total user count #{User.count}"
end

What kind of spam detection were you thinking? Akismet?

2 Likes

Thanks for the clarification!

Honestly, I don’t know. I don’t know what Discourse uses to detect and prevent spam.

Also I believe a fair number of my spammers posted messages on public users’ profiles, a feature that doesn’t exist on Discourse.
These messages were imported on Discourse as “regular” topics, with no category and no title, making them easy to identify.:


Example of a spam profile:

I don’t want to delete all these titleless messages, most being harmless and potentially containing information that some users would like to get back

What I’d like to do is delete users that posted only titleless topics and remove the user’s topics too.
Could such a thing be done fairly easily with rails commands?

1 Like

It might take a little experimentation, but yes. Something like this might work as a starting point:

User.find_each do |user|
  untitled_topic_count = user.topics.where(title: "").count
  titled_topic_count = user.topics.where.not(title: "").count
  if untitled_topic_count > 0 && titled_topic_count == 0
    # delete the topics and/or the user
  end
end
4 Likes

For the record, here are my conditions:

In my case, a spammer must have:

  • More than 1 post
  • at least 1 titleless topic
  • No titled topic
  • The same number of posts as the number of topics (since a topic IS a post)

So I just added conditions:

User.find_each do |user|
  untitled_topic_count = user.topics.where(title: "").count
  titled_topic_count = user.topics.where.not(title: "").count
  topic_count = untitled_topic_count + titled_topic_count
  post_count = user.posts.count
  if post_count > 1 && untitled_topic_count > 0 && titled_topic_count == 0 && post_count == topic_count
  	puts "SPAMMER ?"
  end
end

Let’s hope that won’t target legit users, but that seems safe so far, looking randomly at targets. :wink:

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.