Posting a link to this thread in case anyone else does an import from vBulletin and then can’t work out why the Rails console doesn’t work — you need to backup your data after an import and migrate it to a new server before things work properly (I don’t know why).
It might also be worth linking to this thread on the matter of vBulletin imported user background images.
Thanks for this info. Planning to do the migration as well @enigmaty How big was your database? Are the attachments where on the file server or the DB? I’m at 1GB DB & 100GB+ on FS. Running vBulletin4
I am writing on a couple vbulletin imports now. One has almost five million posts and took about a week to run on a earlier test. I’ve made some improvements to the script that handle PMs, internal links, 301 redirect, and a bunch of formatting stuff.
I usually add an import_after environment variable to importers that I work with to do testing. It’s easiest to work with the whole database since you don’t know what problems you might introduce mucking with the database.
I agree that mucking with the DB/vBulletin system in a synthetic way is a bad idea… I just did a bit of research and I reckon I’ve found a reasonable solution:
Spin a server that can run vBulletin
Create a full backup of the original vBulletin and copy to the server the we created on 1.
Using vBulletin4 Control Panel, delete every post that is older/newer then X days. I reckon that way we won’t be “mucking” with it and data integrity won’t be compromised.
Check the vBulletin board. Simple sanity check. Posts counters.
Do the import script that is begin introduced here.
I’ve noticed this importer does not take additional usergroups into consideration - this is something that we cannot ignore for our import, but despite my limited experience with Ruby, have actually managed to rig the vBulletin import script so that it will import all of a member’s usergroups and not just their primary group. I also added a check to automatically assign admins and moderators who were in the correct groups.
I am experiencing an unrelated problem, however. We have about 100k members to import and it seems to fail when it comes across a problematic user:
Error on record: {:name=>"Moderators", :username=>"Moderators", :email=>"moderators@mydomain.com", :title=>"Moderating Team", :primary_group_id=>45, :admin=>false, :moderator=>false, :created_at=>Wed, 24 Aug 2011 10:34:00 UTC +00:00, :last_seen_at=>Thu, 01 Jan 1970 01:00:00 UTC +00:00, :ip_address=>nil, :trust_level=>1, :active=>true, :import_mode=>true, last_emailed_at=>2018-04-25 20:16:12 +0000}
[snip]
Validation failed: Username must be unique (ActiveRecord::RecordInvalid)
This is interesting, because I couldn’t see a system user or other user with this name already in the Discourse database. Unless the name “Moderators” is a reserved/blacklisted named, that is. Then this could be the problem.
Regardless, renaming the user to something else fixed it.
What I have done is add a check that appends the group ID to the name if it already exists. Renaming your moderators group would also work. Another solution is to ignore the group and make members of that group be discourse moderators (that’s probably the best solution).
…and then inside the user object itself, I have added:
moderator: group_ids.include?(GROUP_VB_MOD),
GROUP_VB_MOD is a constant that represents the vBulletin moderator group - #7 by default.
Then inside the post_create_action, I have added:
# Add user to the necessary groups
GroupUser.transaction do
group_ids.each do |gid|
(group_id = group_id_from_imported_group_id(gid)) &&
GroupUser.find_or_create_by(user: u, group_id: group_id)
end
end
It seems to do the job alright. We’ve got a lot of groups carried over from about 10 years of stuff, and some of these groups still serve a purpose.
This has mostly been working great, for me. I’m doing a proof of concept to migrate a large vBulletin 4 install in my development environment (approximately 3mm posts).
I’m having a few problems, however.
First, I made some performance tweaks to the OS (CentOS 7.5), postgres, and redis - per the postgres and redis documentation and performance wikis. This was crucial to get things moving quicker. I can outline those tweaks if anyone really needs the info, but just be aware that your stock postgres and redis installs are not tuned to handle really big volume.
I’ve had to add more memory and CPU to my dev, as ruby and redis both chew up a lot of memory, which leads to swapping, which blows out I/O…
Currently, I’ve plateaued about 2.5mm posts. I can get maybe 1000 posts imported each time before one of either the redis or ruby process crashes again, and then have to start over. To ease this pain, I’ve commented the following:
import_groups
import_users
create_groups_membership
import_categories
import_topics
which saves me a few minutes each restart.
My question is, without me going through each line of the import code, is there something that somebody else has done to get over this plateau? This is turning into a multi-month project, if I have to continue on with the status quo.
It looks like for my case maybe the posts array in base/lookup_container.rb should be overridden to go direct to the DB rather than caching the entire catalog in process. Thoughts?
(editing)
So, yeah, it seems like that’s the ticket. Last hail mary pass to add the remainder of my unused memory to the dev vm, and it seems to be working. As such, that’s probably not a concern for this specific importer, but for the higher-level import code team.
Yeah, in fact the reason for the crashes was that I ran out of memory and swap. If you look at the import code, when it runs you see “loading existing groups… loading existing user…” etc. All that is being cached in the ruby process and in redis.
I solved the problem by just adding more memory and swap. I estimate you need at least 1gb free per million posts in order to successfully import. If you have to restart the import because of a crash, it still sucks every ID out of postgres into memory. What’s the point of having a database, if you do things that way?
Not trying to be critical, I appreciate that the importer is there and “works.” I would have written it differently, though. Perhaps someday I will.
The trade-off is speed vs memory. If you want performance, you can try the bulk_import version. I just did one 5M post import and am finishing up a 4M post import with 160GB of attachments. I made lots of edits to handle attachments not linked to in the post, resolve internal links and convert quotes into Discourse quotes. One day, I’ll get around to submitting a PR.
I have this on my to-do list. Would be interested in what you did. I imagine an update via sql followed by a rebake? I haven’t put any cycles on it, yet…