Various question about cleaning up an imported vbulletin

Canapin · August 21, 2020, 9:29am

Hi,
I imported a 20 years old forum to Discourse.
It has many unused or spam accounts and spam messages.
I’d like to do a cleanup.

I set up all my users to trust level 0.
I tried to know how many users have never posted any message with this data explorer query:

SELECT COUNT (DISTINCT user_id) from posts

It counts 28530.

I tried to know how many total users I have:

SELECT COUNT (DISTINCT id) FROM users

It counts 180000 (vBulletin stats said we indeed had 180000 accounts)

I triggered the CleanUpInactiveUsers Sidekiq job.
Only a few hundred users were removed. I looked at one of these unused profiles and saw no activity: no post, no topic… However, Discobot sent them a message after I imported all the users from vBulletin. It was 5 days ago.
So I Discourse settings, I set clean up inactive users after days to 1.
I trigger CleanUpInactiveUsers Sidekiq job again.
I lost about 1000 users.
I still have 178000 users and I know that most of them are empty and unused profiles with no message.

Any idea why they aren’t removed by CleanUpInactiveUsers?

Plus, since there are many spams accounts and messages. Is it possible to trigger spam detection on existing users and messages, and clean all that up too?

david · August 21, 2020, 1:51pm

The job has a limit of 1000 per run, to avoid clogging the sidekiq queue

github.com

discourse/discourse/blob/106a2f58a2169dcd9a76d4ccb6d696a4c2364efc/app/jobs/scheduled/clean_up_inactive_users.rb#L17-L17


      
          .limit(1000)

You could run it from the rails console like

Jobs::CleanUpInactiveUsers.new.execute({})

And then put it in a loop like

100.times do 
  Jobs::CleanUpInactiveUsers.new.execute({})
  puts "Done iteration. Total user count #{User.count}"
end

What kind of spam detection were you thinking? Akismet?

Canapin · August 21, 2020, 2:05pm

Thanks for the clarification!

Honestly, I don’t know. I don’t know what Discourse uses to detect and prevent spam.

Also I believe a fair number of my spammers posted messages on public users’ profiles, a feature that doesn’t exist on Discourse.
These messages were imported on Discourse as “regular” topics, with no category and no title, making them easy to identify.:

Example of a spam profile:

I don’t want to delete all these titleless messages, most being harmless and potentially containing information that some users would like to get back

What I’d like to do is delete users that posted only titleless topics and remove the user’s topics too.
Could such a thing be done fairly easily with rails commands?

david · August 21, 2020, 2:11pm

It might take a little experimentation, but yes. Something like this might work as a starting point:

User.find_each do |user|
  untitled_topic_count = user.topics.where(title: "").count
  titled_topic_count = user.topics.where.not(title: "").count
  if untitled_topic_count > 0 && titled_topic_count == 0
    # delete the topics and/or the user
  end
end

Canapin · September 9, 2020, 8:42pm

For the record, here are my conditions:

In my case, a spammer must have:

More than 1 post
at least 1 titleless topic
No titled topic
The same number of posts as the number of topics (since a topic IS a post)

So I just added conditions:

User.find_each do |user|
  untitled_topic_count = user.topics.where(title: "").count
  titled_topic_count = user.topics.where.not(title: "").count
  topic_count = untitled_topic_count + titled_topic_count
  post_count = user.posts.count
  if post_count > 1 && untitled_topic_count > 0 && titled_topic_count == 0 && post_count == topic_count
  	puts "SPAMMER ?"
  end
end

Let’s hope that won’t target legit users, but that seems safe so far, looking randomly at targets.

system · October 9, 2020, 8:43pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
[Paid] Prune Spam users Marketplace	17	2143	July 6, 2017
Expanding the 'suspect user' filter Feature	5	711	December 3, 2019
Auto cleanup of inactive users is not working Support	7	617	September 7, 2022
Bulk deleting / mass modifying users Support	1	5248	February 10, 2018
Import phpBB - when are accounts deleted? Migration	3	31	September 16, 2024

Various question about cleaning up an imported vbulletin

Related topics