Importing from vBulletin 4

import

(Chris Croome) #21

The problem was a user with �� in the username field, once the HTML entities were removed the import script ran, but there were a lot of errors like this:

ERR Error running script (call to f_b06356ba4628144e123b652c99605b873107c9be): @user_script:14: @user_script: 14: -MISCONF Redis is configured to save RDB snapshots, but is currently not able to persist on disk. Commands that may modify the data set are disabled. Please check Redis logs for details about the error. 

Is this something that I should be concerned about and something that I should fix and then redo the import?


(Jay Pfaffman) #22

Are you out of disk space?


(Chris Croome) #23

Yes! I’m now spinning up a new virtual server with a huge amount of disk space, CPUs and RAM and I’m going to start from scratch…


(Chris Croome) #24

By the way, I’m following the instructions above for the Docker version and found that I needed to add an additional step in the container

echo "discourse ALL = NOPASSWD: ALL" >> /etc/sudoers

Before running:

su discourse -c 'bundle install --no-deployment --without test --without development'

In the non-docker instructions above there is this step to delete data from the database before doing the import:

  • Clear existing data from your local Discourse instance
cd ~/discourse
bundle exec rake db:drop db:create db:migrate

Was this omitted from the Docker instructions on purpose?


(Chris Croome) #25

There are two, different, vbulletin import scripts, in the docker container:

cd /var/www/discourse
find ./ -name vbulletin.rb
./script/bulk_import/vbulletin.rb
./script/import_scripts/vbulletin.rb

The second one is twice the size of the first one and it is the one I have been using.


(Chris Croome) #26

Posting a link to this thread in case anyone else does an import from vBulletin and then can’t work out why the Rails console doesn’t work — you need to backup your data after an import and migrate it to a new server before things work properly (I don’t know why).

It might also be worth linking to this thread on the matter of vBulletin imported user background images.


#27

Thanks for this info. Planning to do the migration as well
@enigmaty How big was your database? Are the attachments where on the file server or the DB? I’m at 1GB DB & 100GB+ on FS. Running vBulletin4


(Jay Pfaffman) #28

I am writing on a couple vbulletin imports now. One has almost five million posts and took about a week to run on a earlier test. I’ve made some improvements to the script that handle PMs, internal links, 301 redirect, and a bunch of formatting stuff.

If I ever get caught up I’ll submit a PR.


#29

If we could have snapshot of correlated data till/between certain dates in order to do proper testing it would be great.

Is there a way to export only, let’s say, 5% of correlated info (DB+attachments) so it will have data integrity?


(Jay Pfaffman) #30

I usually add an import_after environment variable to importers that I work with to do testing. It’s easiest to work with the whole database since you don’t know what problems you might introduce mucking with the database.


#31

Thanks for replying Jay.

I agree that mucking with the DB/vBulletin system in a synthetic way is a bad idea… I just did a bit of research and I reckon I’ve found a reasonable solution:

  1. Spin a server that can run vBulletin
  2. Create a full backup of the original vBulletin and copy to the server the we created on 1.
  3. Using vBulletin4 Control Panel, delete every post that is older/newer then X days. I reckon that way we won’t be “mucking” with it and data integrity won’t be compromised.
  4. Check the vBulletin board. Simple sanity check. Posts counters.
  5. Do the import script that is begin introduced here.
  6. Do some testing. Posts counters.

Thoughts?


(Jay Pfaffman) #33

Sounds like a lot of work. Not what I’d want to do, but it might be your solution.


(David 'Maisy' M.) #34

I’ve noticed this importer does not take additional usergroups into consideration - this is something that we cannot ignore for our import, but despite my limited experience with Ruby, have actually managed to rig the vBulletin import script so that it will import all of a member’s usergroups and not just their primary group. I also added a check to automatically assign admins and moderators who were in the correct groups.

I am experiencing an unrelated problem, however. We have about 100k members to import and it seems to fail when it comes across a problematic user:

Error on record: {:name=>"Moderators", :username=>"Moderators", :email=>"moderators@mydomain.com", :title=>"Moderating Team", :primary_group_id=>45, :admin=>false, :moderator=>false, :created_at=>Wed, 24 Aug 2011 10:34:00 UTC +00:00, :last_seen_at=>Thu, 01 Jan 1970 01:00:00 UTC +00:00, :ip_address=>nil, :trust_level=>1, :active=>true, :import_mode=>true, last_emailed_at=>2018-04-25 20:16:12 +0000}

[snip]

Validation failed: Username must be unique (ActiveRecord::RecordInvalid)

This is interesting, because I couldn’t see a system user or other user with this name already in the Discourse database. Unless the name “Moderators” is a reserved/blacklisted named, that is. Then this could be the problem.

Regardless, renaming the user to something else fixed it.


(Jay Pfaffman) #35

What I have done is add a check that appends the group ID to the name if it already exists. Renaming your moderators group would also work. Another solution is to ignore the group and make members of that group be discourse moderators (that’s probably the best solution).


(David 'Maisy' M.) #36

This is similar to what I have also done.

Inside the create_users loop, I have added a line to combine all the user’s Group IDs:

group_ids = [ user["usergroupid"], *user["membergroupids"].split(',').map(&:to_i) ]

…and then inside the user object itself, I have added:

moderator: group_ids.include?(GROUP_VB_MOD),

GROUP_VB_MOD is a constant that represents the vBulletin moderator group - #7 by default.

Then inside the post_create_action, I have added:

# Add user to the necessary groups
GroupUser.transaction do
  group_ids.each do |gid|
    (group_id = group_id_from_imported_group_id(gid)) &&
      GroupUser.find_or_create_by(user: u, group_id: group_id)
  end
end

It seems to do the job alright. We’ve got a lot of groups carried over from about 10 years of stuff, and some of these groups still serve a purpose.


#37

This has mostly been working great, for me. I’m doing a proof of concept to migrate a large vBulletin 4 install in my development environment (approximately 3mm posts).

I’m having a few problems, however.

First, I made some performance tweaks to the OS (CentOS 7.5), postgres, and redis - per the postgres and redis documentation and performance wikis. This was crucial to get things moving quicker. I can outline those tweaks if anyone really needs the info, but just be aware that your stock postgres and redis installs are not tuned to handle really big volume.

I’ve had to add more memory and CPU to my dev, as ruby and redis both chew up a lot of memory, which leads to swapping, which blows out I/O…

Currently, I’ve plateaued about 2.5mm posts. I can get maybe 1000 posts imported each time before one of either the redis or ruby process crashes again, and then have to start over. To ease this pain, I’ve commented the following:
import_groups
import_users
create_groups_membership
import_categories
import_topics

which saves me a few minutes each restart.

My question is, without me going through each line of the import code, is there something that somebody else has done to get over this plateau? This is turning into a multi-month project, if I have to continue on with the status quo.

TIA


#38

It looks like for my case maybe the posts array in base/lookup_container.rb should be overridden to go direct to the DB rather than caching the entire catalog in process. Thoughts?

(editing)
So, yeah, it seems like that’s the ticket. Last hail mary pass to add the remainder of my unused memory to the dev vm, and it seems to be working. As such, that’s probably not a concern for this specific importer, but for the higher-level import code team.


(Régis Hanol) #39

I’m actually importing a vBulletin forum which has over 5M posts and while it’s kinda slow (~1000 posts / min), it never crashed.

Are you sure your “performance tweaks” aren’t causing these crashes?


#40

Yeah, in fact the reason for the crashes was that I ran out of memory and swap. If you look at the import code, when it runs you see “loading existing groups… loading existing user…” etc. All that is being cached in the ruby process and in redis.

I solved the problem by just adding more memory and swap. I estimate you need at least 1gb free per million posts in order to successfully import. If you have to restart the import because of a crash, it still sucks every ID out of postgres into memory. What’s the point of having a database, if you do things that way?

Not trying to be critical, I appreciate that the importer is there and “works.” I would have written it differently, though. Perhaps someday I will.


(Régis Hanol) #41

No worries.

We’re always open to PR improving perf :wink: