Importing / migrating from vBulletin 4

It’s either that those users didn’t get imported for some reason (missing email address used to a cause, but that should be resolved now), or for some other reason the code that looks up imported usernames isn’t working (case of the username, perhaps?).

3 Likes

@pfaffman yeah it does look that way although it’s not clear on some of the specifics. Let’s look at the first one for example.

  1. What does pm-149 mean?
  2. For a:1:{i:486;s:5:"TonyN";} the text “TonyN” looks like the username but what about the other numbers?
  3. How about [nil, 270]? What does 270 signify?

If I can understand what it’s complaining I can at least try to look up the database to see if there are any data problems. But I’m not sure what these really mean.

BTW I also noticed that all imported forums have permission for everyone. This means all the forums permitted only for moderators were set to be everyone visible. Is there a way to control this?

1 Like

Sorry. I don’t remember well enough to explain. This is about all of the free help I have to offer on this one.

Of course. See How to use category security settings to create private categories - faq - Discourse Meta

Some importers endeavor to import groups, but few of them know how to apply those permissions to the categories that get imported. You’ll need to fix those up by hand.

2 Likes

Following @titusc 's instructions, and I seem to be having issues importing the database…

root@DO-Discourse-app:/shared/vbulletin# mysql -uroot -p vb4 < CC12-Sat-Full-Backup.sql
Enter password: 
ERROR 1265 (01000) at line 4928: Data truncated for column 'method' at row 1
root@DO-Discourse-app:/shared/vbulletin#

Any suggestions on what it’s looking for?

N/M It’s errors in the original database…

2 Likes

I’m only a recent discourse convert, so after a lot of trial and error I’ve combined everything above into a full command by command list (thanks @titusca and @enigmaty).

Hopefully this will help (or at least accelerate) fellow newcomers go from start to finish. Would like to incorporate this into the first post given the updates to mysql->mariadb that I think have thrown a lot of confusion into the process.

Background:

  • 1.6 million post transfer.
  • Utilized Digital Ocean Droplet (CPU Optimized 4 vCPU/8GB)

#1 - Install Digital Ocean Discourse 1-click droplet

#2 - Finish discourse install through SSH by following prompts

Open SSH console
root
(yourrootpassword)
(enter)
(yourdomain).com
(etc…)

#3 - Login to SFTP to upload database dump

sftp root@XXX.XXX.XX.XX
y
yes
(yourrootpassword)
put db.sql /var/discourse/shared/standalone/db.sql

#4 - Login to new discourse website to setup admin account

#5 - Login to SSH - begin process

ssh root@XXX.XXX.XX.XX
cd /var/discourse
./launcher start app
docker exec -it app bash
sudo apt-get update
sudo apt-get upgrade
y

#6 - Install MariaDB (replacement for mysql)

apt-get update && apt-get install mariadb-server-10.3 libmariadbd-dev
y

#7 - Mysql Database Setup

service mysql start
mysql -u root -p
password
create database vbulletin;
exit;

#8 - Vbulletin -> Mysql Database Transfer

mysql -u root -p vbulletin < /shared/db.sql
password

#9 - GEM File

echo “gem ‘mysql2’” >>Gemfile
echo “gem ‘mysql2’, require: false” >> /var/www/discourse/Gemfile
echo “gem ‘php_serialize’, require: false” >> /var/www/discourse/Gemfile
cd /var/www/discourse
su discourse -c ‘bundle install --no-deployment --without test --without development --path vendor/bundle’
(Ignore red text result)

#10 - Configure install script

vi /var/www/discourse/script/import_scripts/vbulletin.rb

#10.a - Make edits to text file as needed

DB_HOST ||= ENV[‘DB_HOST’] || “localhost”
DB_NAME ||= ENV[‘DB_NAME’] || “vbulletin”
DB_PW ||= ENV[‘DB_PW’] || “password”
DB_USER ||= ENV[‘DB_USER’] || “root”
TIMEZONE ||= ENV[‘TIMEZONE’] || “America/Los_Angeles”
TABLE_PREFIX ||= ENV[‘TABLE_PREFIX’] || “”
ATTACHMENT_DIR ||= ENV[‘ATTACHMENT_DIR’] || ‘/shared/attachments/’

#10.c - End edits

:wq

#11 - Bundle Config

bundle config set path ‘vendor/bundle’
bundle config set without ‘development:test’
bundle config unset deployment
su discourse -c ‘bundle install’

#12 - Mysql config (may be possible to do this with previous)

mysql --version
sudo mysql -u root -p
password
ALTER USER 'root'@'localhost' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;
exit

#13 - Install Script

su discourse -c ‘bundle exec ruby script/import_scripts/vbulletin.rb’

Good luck!

5 Likes

Just wanted to leave feedback after our migration from vB4:

  • FIXED Soft-Deleted posts where not properly hidden: https://github.com/discourse/discourse/pull/12057
  • [ul] + [li] and nested [LIST] were not migrated properly and the BBcode plugin doesn’t seem to handle this either → This seems to be expected: CommonMark testing started here! (Quote: Core will not implement [ul] [ol] and [li] support for BBCode cause it is a recipe for failure.) → I will need to build some RegEx magic post-fixup for this.
  • We made an initial migration using the normale importer (took > 3 days) and restarted the migration with newer DB snapshots a couple of times to keep the import “fresh” and reduce the downtime to effectively 30 minutes. This procedure worked quite well, except for everything that was edited after we initially imported the threads, posts. We need to manually rework this information now.
  • Creating Plugins for Discourse is really hard due to lack of documentation and a big picture of how the folder structure works. Though it is getting nicer and better after you understand how it works.

Questions that i have left:

  • I not not sure how the importer maps already imported posts and how to match the old vB4 post_id to the new Discourse post_id to hide those “soft-deleted” post. If someone can give me a hint that would be very welcome! Found it: import_id inside the post_custom_fields table. Nice. Now i need to write some handy script to fix this :slight_smile: → Edit: An even better way is to use the importer script, which maps all imported id’s for easy use.
2 Likes

Unfortunately I can’t edit my previous post :slight_smile:

I found another issue: Every attachment that is not linked into a post, will not be available to Discourse.

My draft PR for fixing this issue: https://github.com/discourse/discourse/pull/12187

Thanks!

3 Likes

Just a quick followup on my issue list. I fixed the visibility problem.

Dump all affected posts from your old vBulletin database:

SELECT postid
FROM `vb4_post`
WHERE `visible` > '1'
ORDER BY postid

Make an imported_post_ids.txt file which has all the postid’s line by line

Create a new file for the fixing script:

nano script/import_scripts/fix_visibility.rb 

Content:

require_relative '../../config/environment'
require_relative 'base/lookup_container'

@lookup = ImportScripts::LookupContainer.new

broken_postids = []
broken_real_postids = []

File.foreach("imported_post_ids.txt") do |line|
  broken_postids.append(line.to_i)
end

broken_postids.each do |id|
  broken_real_postids.append(@lookup.post_id_from_imported_post_id(id))
end

broken_real_postids.each do |id|
  puts id
  Post.find(id).trash!
end

Run the script:

su discourse -c 'bundle exec ruby script/import_scripts/fix_visibility.rb'

The script will use the logic from the importer to map the imported post_id’s to the read discourse post_id’s which we want to hide.

4 Likes

Hi folks,

I’ve got the script churning away on a vb3 migration. I’m doing one step at a time and it’s currently churning through 122k users at 330/minute. Then we’ll have 2.5 million posts to go through.

We’re doing this on a production server. Nobody’s using the discourse site, we just set it up and it’s at an anonymous url. If I log in I can see the new user notifications incrementing. Probably a dumb question, but I wonder if the migration would process faster if we suspended or disabled the live site somehow?

1 Like

That depends on the load and amount of CPU’s on your production server. You can always try to stop the web server for 5 minutes and see if the import goes faster.

3 Likes

Importing really takes a while. As far as i know the bulk importer should be faster. We did a first import from a backup on our beefy dev machine and then did an incremental one from another backup to make the switch to Discourse with only half an hour of downtime. Beware of the things that can go wrong when doing incremental updates :slight_smile: (See here: Importing / migrating from vBulletin 4 - #132 by paresy)

paresy

3 Likes

I see one core pegged that I think is the server ingesting the updated data, and another core pegged when running the import script. I really don’t have the domain knowledge to know if competition between those two processes for the db resource could be slowing down the importer, and I also don’t have the domain knowledge to know if it’s even possible to stop the ingestion while leaving the container up. The ingestion has to happen anyway, so I suppose the safest thing to do is just let it keep on churning away.

One tip for future readers, I see that 27k (22%!) of our users are banned spambots. We’ll purge them on the source side before doing the final import.

[adding] One necessary edit that I don’t see mentioned above:

--- a/script/import_scripts/vbulletin.rb
+++ b/script/import_scripts/vbulletin.rb
@@ -134,6 +133,7 @@ EOM
        , usertitle
        , usergroupid
        , joindate
+       , lastvisit
        , email
        , password
        , salt

And an edit that may be vb3 specific:

--- a/script/import_scripts/vbulletin.rb
+++ b/script/import_scripts/vbulletin.rb
@@ -987,7 +989,7 @@ EOM
   end

   def parse_timestamp(timestamp)
-    Time.zone.at(@tz.utc_to_local(timestamp))
+    Time.zone.at(@tz.utc_to_local(Time.at(timestamp)))
   end

[adding] The import is running on an oracle cloud 4-core ampere instance. For comparison I installed a discourse dev server local/native on an M1 macbook air and was surprised that the import process ran significantly slower.

5 Likes

Were you getting errors with the pre-existing script? I lost the date & time information from all our old vBulletin 4 posts because of that. If this is a fix, I’d love to know if reimporting would be a good idea if all posts have been copied over.

2 Likes

Yes, the script would error out because it was feeding an integer to a time function.

3 Likes

No. The script skips posts that have been imported already.

3 Likes

Hi,

Did you figure out how to fix this?

Our two main/bottom forums have parentid = -1 (I think this is due to us converting from v3 back in the day).

I’m not sure how to go ahead, do I just set them to 0 if -1 in the conversion script? Assuming 0 is the main discourse category?

Actually, looking at the discourse site now; those two seems to be the only two that has been imported?

 importing top level categories...
         2 / 2 (100.0%)  [211 items/min]  in]
 importing children categories...
 Traceback (most recent call last):
         5: from script/import_scripts/vbulletin.rb:1003:in `<main>'
         4: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
         3: from script/import_scripts/vbulletin.rb:84:in `execute'
         2: from script/import_scripts/vbulletin.rb:287:in `import_categories'
         1: from script/import_scripts/vbulletin.rb:287:in `each'
script/import_scripts/vbulletin.rb:289:in `block in import_categories': undefined method `[]' for nil:NilClass (NoMethodError)
1 Like

Probably. I’ve done a bunch of vBulletin imports since then. :person_shrugging:

You’ll just have to give it a try and see what happens. It does look like the same thing I described.

I’d just modify the script to . . . do something … if that thing is nil.

1 Like

Absolutely, but I don’t know enough about how discourse works to know what to set it to.
What would discourse do if I set them to a random number like 0? Or should I find a category number already in the db and set it to that?

Not so sturdy in Ruby, would this work you think?

        if categories.detect { |c| c["forumid"] == cc["parentid"] }["parentid"].nil?
          cc["parentid"] = 52
        else
          cc["parentid"] = categories.detect { |c| c["forumid"] == cc["parentid"] }["parentid"]
        end

Actually it seems there are lots of deleted forums who’s parentid no longer exists

EDIT
I just set them all to one parent topic, and I can fix it up later

1 Like

We’ve finally come to the attachment import part, it came to around 1.9% and now we get this error

    67406 / 3550728 (  1.9%)  Traceback (most recent call last):
        23: from script/import_scripts/vbulletin.rb:1006:in `<main>'
        22: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
        21: from script/import_scripts/vbulletin.rb:88:in `execute'
        20: from script/import_scripts/vbulletin.rb:610:in `import_attachments'
        19: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/querying.rb:22:in `find_each'
        18: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:70:in `find_each'
        17: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:137:in `find_in_batches'
        16: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:229:in `in_batches'
        15: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:229:in `loop'
        14: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:245:in `block in in_batches'
        13: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:138:in `block in find_in_batches'
        12: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:71:in `block in find_each'
        11: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:71:in `each'
        10: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:71:in `block (2 levels) in find_each'
         9: from script/import_scripts/vbulletin.rb:651:in `block in import_attachments'
         8: from script/import_scripts/vbulletin.rb:651:in `each'
         7: from script/import_scripts/vbulletin.rb:659:in `block (2 levels) in import_attachments'
         6: from /var/www/discourse/script/import_scripts/base.rb:873:in `html_for_upload'
         5: from /var/www/discourse/script/import_scripts/base/uploader.rb:40:in `html_for_upload'
         4: from /var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
         3: from /var/www/discourse/lib/upload_markdown.rb:19:in `image_markdown'
         2: from /var/www/discourse/app/models/upload.rb:206:in `short_url'
         1: from /var/www/discourse/app/models/upload.rb:534:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:270:in `base62_sha1': undefined method `hex' for nil:NilClass (NoMethodError)

undefined method `hex’ for nil:NilClass (NoMethodError)

Anyone got any clues as to how to fix this one?

Is it trying to read short_url_basename, and it returns nil; so .hex fails?

1 Like

My guess, without looking at the code, is that the file is missing or maybe there’s a filename field and it is empty? I’d probably put a puts in import_attachments and see what is in the record it’s trying to import.

1 Like