Migrate a vBulletin 4 forum to Discourse

Posting a link to this thread in case anyone else does an import from vBulletin and then can’t work out why the Rails console doesn’t work — you need to backup your data after an import and migrate it to a new server before things work properly (I don’t know why).

It might also be worth linking to this thread on the matter of vBulletin imported user background images.

4 Likes

Thanks for this info. Planning to do the migration as well
@enigmaty How big was your database? Are the attachments where on the file server or the DB? I’m at 1GB DB & 100GB+ on FS. Running vBulletin4

I am writing on a couple vbulletin imports now. One has almost five million posts and took about a week to run on a earlier test. I’ve made some improvements to the script that handle PMs, internal links, 301 redirect, and a bunch of formatting stuff.

If I ever get caught up I’ll submit a PR.

3 Likes

If we could have snapshot of correlated data till/between certain dates in order to do proper testing it would be great.

Is there a way to export only, let’s say, 5% of correlated info (DB+attachments) so it will have data integrity?

I usually add an import_after environment variable to importers that I work with to do testing. It’s easiest to work with the whole database since you don’t know what problems you might introduce mucking with the database.

Thanks for replying Jay.

I agree that mucking with the DB/vBulletin system in a synthetic way is a bad idea… I just did a bit of research and I reckon I’ve found a reasonable solution:

  1. Spin a server that can run vBulletin
  2. Create a full backup of the original vBulletin and copy to the server the we created on 1.
  3. Using vBulletin4 Control Panel, delete every post that is older/newer then X days. I reckon that way we won’t be “mucking” with it and data integrity won’t be compromised.
  4. Check the vBulletin board. Simple sanity check. Posts counters.
  5. Do the import script that is begin introduced here.
  6. Do some testing. Posts counters.

Thoughts?

Sounds like a lot of work. Not what I’d want to do, but it might be your solution.

I’ve noticed this importer does not take additional usergroups into consideration - this is something that we cannot ignore for our import, but despite my limited experience with Ruby, have actually managed to rig the vBulletin import script so that it will import all of a member’s usergroups and not just their primary group. I also added a check to automatically assign admins and moderators who were in the correct groups.

I am experiencing an unrelated problem, however. We have about 100k members to import and it seems to fail when it comes across a problematic user:

Error on record: {:name=>"Moderators", :username=>"Moderators", :email=>"moderators@mydomain.com", :title=>"Moderating Team", :primary_group_id=>45, :admin=>false, :moderator=>false, :created_at=>Wed, 24 Aug 2011 10:34:00 UTC +00:00, :last_seen_at=>Thu, 01 Jan 1970 01:00:00 UTC +00:00, :ip_address=>nil, :trust_level=>1, :active=>true, :import_mode=>true, last_emailed_at=>2018-04-25 20:16:12 +0000}

[snip]

Validation failed: Username must be unique (ActiveRecord::RecordInvalid)

This is interesting, because I couldn’t see a system user or other user with this name already in the Discourse database. Unless the name “Moderators” is a reserved/blacklisted named, that is. Then this could be the problem.

Regardless, renaming the user to something else fixed it.

What I have done is add a check that appends the group ID to the name if it already exists. Renaming your moderators group would also work. Another solution is to ignore the group and make members of that group be discourse moderators (that’s probably the best solution).

1 Like

This is similar to what I have also done.

Inside the create_users loop, I have added a line to combine all the user’s Group IDs:

group_ids = [ user["usergroupid"], *user["membergroupids"].split(',').map(&:to_i) ]

…and then inside the user object itself, I have added:

moderator: group_ids.include?(GROUP_VB_MOD),

GROUP_VB_MOD is a constant that represents the vBulletin moderator group - #7 by default.

Then inside the post_create_action, I have added:

# Add user to the necessary groups
GroupUser.transaction do
  group_ids.each do |gid|
    (group_id = group_id_from_imported_group_id(gid)) &&
      GroupUser.find_or_create_by(user: u, group_id: group_id)
  end
end

It seems to do the job alright. We’ve got a lot of groups carried over from about 10 years of stuff, and some of these groups still serve a purpose.

3 Likes

This has mostly been working great, for me. I’m doing a proof of concept to migrate a large vBulletin 4 install in my development environment (approximately 3mm posts).

I’m having a few problems, however.

First, I made some performance tweaks to the OS (CentOS 7.5), postgres, and redis - per the postgres and redis documentation and performance wikis. This was crucial to get things moving quicker. I can outline those tweaks if anyone really needs the info, but just be aware that your stock postgres and redis installs are not tuned to handle really big volume.

I’ve had to add more memory and CPU to my dev, as ruby and redis both chew up a lot of memory, which leads to swapping, which blows out I/O…

Currently, I’ve plateaued about 2.5mm posts. I can get maybe 1000 posts imported each time before one of either the redis or ruby process crashes again, and then have to start over. To ease this pain, I’ve commented the following:
import_groups
import_users
create_groups_membership
import_categories
import_topics

which saves me a few minutes each restart.

My question is, without me going through each line of the import code, is there something that somebody else has done to get over this plateau? This is turning into a multi-month project, if I have to continue on with the status quo.

TIA

It looks like for my case maybe the posts array in base/lookup_container.rb should be overridden to go direct to the DB rather than caching the entire catalog in process. Thoughts?

(editing)
So, yeah, it seems like that’s the ticket. Last hail mary pass to add the remainder of my unused memory to the dev vm, and it seems to be working. As such, that’s probably not a concern for this specific importer, but for the higher-level import code team.

I’m actually importing a vBulletin forum which has over 5M posts and while it’s kinda slow (~1000 posts / min), it never crashed.

Are you sure your “performance tweaks” aren’t causing these crashes?

2 Likes

Yeah, in fact the reason for the crashes was that I ran out of memory and swap. If you look at the import code, when it runs you see “loading existing groups… loading existing user…” etc. All that is being cached in the ruby process and in redis.

I solved the problem by just adding more memory and swap. I estimate you need at least 1gb free per million posts in order to successfully import. If you have to restart the import because of a crash, it still sucks every ID out of postgres into memory. What’s the point of having a database, if you do things that way?

Not trying to be critical, I appreciate that the importer is there and “works.” I would have written it differently, though. Perhaps someday I will.

3 Likes

No worries.

We’re always open to PR improving perf :wink:

2 Likes

The trade-off is speed vs memory. If you want performance, you can try the bulk_import version. I just did one 5M post import and am finishing up a 4M post import with 160GB of attachments. I made lots of edits to handle attachments not linked to in the post, resolve internal links and convert quotes into Discourse quotes. One day, I’ll get around to submitting a PR.

Next time, I’ll try the bulk importer.

5 Likes

I have this on my to-do list. Would be interested in what you did. I imagine an update via sql followed by a rebake? I haven’t put any cycles on it, yet…

I’ve got a console task that does this. I’ll try to share but am taking a week vacation in 6 hours.

1 Like

All my vacations seem to fit into 6 hours :rofl:

2 Likes

the installer is not asking me a password!!! :scream:
How do i proceed???