Migrate a vBulletin 4 forum to Discourse

I’m only a recent discourse convert, so after a lot of trial and error I’ve combined everything above into a full command by command list (thanks @titusca and @enigmaty).

Hopefully this will help (or at least accelerate) fellow newcomers go from start to finish. Would like to incorporate this into the first post given the updates to mysql->mariadb that I think have thrown a lot of confusion into the process.

Background:

  • 1.6 million post transfer.
  • Utilized Digital Ocean Droplet (CPU Optimized 4 vCPU/8GB)

#1 - Install Digital Ocean Discourse 1-click droplet

#2 - Finish discourse install through SSH by following prompts

Open SSH console
root
(yourrootpassword)
(enter)
(yourdomain).com
(etc…)

#3 - Login to SFTP to upload database dump

sftp root@XXX.XXX.XX.XX
y
yes
(yourrootpassword)
put db.sql /var/discourse/shared/standalone/db.sql

#4 - Login to new discourse website to setup admin account

#5 - Login to SSH - begin process

ssh root@XXX.XXX.XX.XX
cd /var/discourse
./launcher start app
docker exec -it app bash
sudo apt-get update
sudo apt-get upgrade
y

#6 - Install MariaDB (replacement for mysql)

apt-get update && apt-get install mariadb-server-10.3 libmariadbd-dev
y

#7 - Mysql Database Setup

service mysql start
mysql -u root -p
password
create database vbulletin;
exit;

#8 - Vbulletin → Mysql Database Transfer

mysql -u root -p vbulletin < /shared/db.sql
password

#9 - GEM File

echo “gem ‘mysql2’” >>Gemfile
echo “gem ‘mysql2’, require: false” >> /var/www/discourse/Gemfile
echo “gem ‘php_serialize’, require: false” >> /var/www/discourse/Gemfile
cd /var/www/discourse
su discourse -c ‘bundle install --no-deployment --without test --without development --path vendor/bundle’
(Ignore red text result)

#10 - Configure install script

vi /var/www/discourse/script/import_scripts/vbulletin.rb

#10.a - Make edits to text file as needed

DB_HOST ||= ENV[‘DB_HOST’] || “localhost”
DB_NAME ||= ENV[‘DB_NAME’] || “vbulletin”
DB_PW ||= ENV[‘DB_PW’] || “password”
DB_USER ||= ENV[‘DB_USER’] || “root”
TIMEZONE ||= ENV[‘TIMEZONE’] || “America/Los_Angeles”
TABLE_PREFIX ||= ENV[‘TABLE_PREFIX’] || “”
ATTACHMENT_DIR ||= ENV[‘ATTACHMENT_DIR’] || ‘/shared/attachments/’

#10.c - End edits

:wq

#11 - Bundle Config

bundle config set path ‘vendor/bundle’
bundle config set without ‘development:test’
bundle config unset deployment
su discourse -c ‘bundle install’

#12 - Mysql config (may be possible to do this with previous)

mysql --version
sudo mysql -u root -p
password
ALTER USER 'root'@'localhost' IDENTIFIED BY 'password';
FLUSH PRIVILEGES;
exit

#13 - Install Script

su discourse -c ‘bundle exec ruby script/import_scripts/vbulletin.rb’

Good luck!

7 Likes

Just wanted to leave feedback after our migration from vB4:

  • FIXED Soft-Deleted posts where not properly hidden: https://github.com/discourse/discourse/pull/12057
  • [ul] + [li] and nested [LIST] were not migrated properly and the BBcode plugin doesn’t seem to handle this either → This seems to be expected: CommonMark testing started here! (Quote: Core will not implement [ul] [ol] and [li] support for BBCode cause it is a recipe for failure.) → I will need to build some RegEx magic post-fixup for this.
  • We made an initial migration using the normale importer (took > 3 days) and restarted the migration with newer DB snapshots a couple of times to keep the import “fresh” and reduce the downtime to effectively 30 minutes. This procedure worked quite well, except for everything that was edited after we initially imported the threads, posts. We need to manually rework this information now.
  • Creating Plugins for Discourse is really hard due to lack of documentation and a big picture of how the folder structure works. Though it is getting nicer and better after you understand how it works.

Questions that i have left:

  • I not not sure how the importer maps already imported posts and how to match the old vB4 post_id to the new Discourse post_id to hide those “soft-deleted” post. If someone can give me a hint that would be very welcome! Found it: import_id inside the post_custom_fields table. Nice. Now i need to write some handy script to fix this :slight_smile: → Edit: An even better way is to use the importer script, which maps all imported id’s for easy use.
2 Likes

Unfortunately I can’t edit my previous post :slight_smile:

I found another issue: Every attachment that is not linked into a post, will not be available to Discourse.

My draft PR for fixing this issue: https://github.com/discourse/discourse/pull/12187

Thanks!

3 Likes

Just a quick followup on my issue list. I fixed the visibility problem.

Dump all affected posts from your old vBulletin database:

SELECT postid
FROM `vb4_post`
WHERE `visible` > '1'
ORDER BY postid

Make an imported_post_ids.txt file which has all the postid’s line by line

Create a new file for the fixing script:

nano script/import_scripts/fix_visibility.rb 

Content:

require_relative '../../config/environment'
require_relative 'base/lookup_container'

@lookup = ImportScripts::LookupContainer.new

broken_postids = []
broken_real_postids = []

File.foreach("imported_post_ids.txt") do |line|
  broken_postids.append(line.to_i)
end

broken_postids.each do |id|
  broken_real_postids.append(@lookup.post_id_from_imported_post_id(id))
end

broken_real_postids.each do |id|
  puts id
  Post.find(id).trash!
end

Run the script:

su discourse -c 'bundle exec ruby script/import_scripts/fix_visibility.rb'

The script will use the logic from the importer to map the imported post_id’s to the read discourse post_id’s which we want to hide.

4 Likes

Hi folks,

I’ve got the script churning away on a vb3 migration. I’m doing one step at a time and it’s currently churning through 122k users at 330/minute. Then we’ll have 2.5 million posts to go through.

We’re doing this on a production server. Nobody’s using the discourse site, we just set it up and it’s at an anonymous url. If I log in I can see the new user notifications incrementing. Probably a dumb question, but I wonder if the migration would process faster if we suspended or disabled the live site somehow?

1 Like

That depends on the load and amount of CPU’s on your production server. You can always try to stop the web server for 5 minutes and see if the import goes faster.

3 Likes

Importing really takes a while. As far as i know the bulk importer should be faster. We did a first import from a backup on our beefy dev machine and then did an incremental one from another backup to make the switch to Discourse with only half an hour of downtime. Beware of the things that can go wrong when doing incremental updates :slight_smile: (See here: Importing / migrating from vBulletin 4 - #132 by paresy)

paresy

3 Likes

I see one core pegged that I think is the server ingesting the updated data, and another core pegged when running the import script. I really don’t have the domain knowledge to know if competition between those two processes for the db resource could be slowing down the importer, and I also don’t have the domain knowledge to know if it’s even possible to stop the ingestion while leaving the container up. The ingestion has to happen anyway, so I suppose the safest thing to do is just let it keep on churning away.

One tip for future readers, I see that 27k (22%!) of our users are banned spambots. We’ll purge them on the source side before doing the final import.

[adding] One necessary edit that I don’t see mentioned above:

--- a/script/import_scripts/vbulletin.rb
+++ b/script/import_scripts/vbulletin.rb
@@ -134,6 +133,7 @@ EOM
        , usertitle
        , usergroupid
        , joindate
+       , lastvisit
        , email
        , password
        , salt

And an edit that may be vb3 specific:

--- a/script/import_scripts/vbulletin.rb
+++ b/script/import_scripts/vbulletin.rb
@@ -987,7 +989,7 @@ EOM
   end

   def parse_timestamp(timestamp)
-    Time.zone.at(@tz.utc_to_local(timestamp))
+    Time.zone.at(@tz.utc_to_local(Time.at(timestamp)))
   end

[adding] The import is running on an oracle cloud 4-core ampere instance. For comparison I installed a discourse dev server local/native on an M1 macbook air and was surprised that the import process ran significantly slower.

6 Likes

Were you getting errors with the pre-existing script? I lost the date & time information from all our old vBulletin 4 posts because of that. If this is a fix, I’d love to know if reimporting would be a good idea if all posts have been copied over.

2 Likes

Yes, the script would error out because it was feeding an integer to a time function.

3 Likes

No. The script skips posts that have been imported already.

3 Likes

Hi,

Did you figure out how to fix this?

Our two main/bottom forums have parentid = -1 (I think this is due to us converting from v3 back in the day).

I’m not sure how to go ahead, do I just set them to 0 if -1 in the conversion script? Assuming 0 is the main discourse category?

Actually, looking at the discourse site now; those two seems to be the only two that has been imported?

 importing top level categories...
         2 / 2 (100.0%)  [211 items/min]  in]
 importing children categories...
 Traceback (most recent call last):
         5: from script/import_scripts/vbulletin.rb:1003:in `<main>'
         4: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
         3: from script/import_scripts/vbulletin.rb:84:in `execute'
         2: from script/import_scripts/vbulletin.rb:287:in `import_categories'
         1: from script/import_scripts/vbulletin.rb:287:in `each'
script/import_scripts/vbulletin.rb:289:in `block in import_categories': undefined method `[]' for nil:NilClass (NoMethodError)
1 Like

Probably. I’ve done a bunch of vBulletin imports since then. :person_shrugging:

You’ll just have to give it a try and see what happens. It does look like the same thing I described.

I’d just modify the script to . . . do something … if that thing is nil.

1 Like

Absolutely, but I don’t know enough about how discourse works to know what to set it to.
What would discourse do if I set them to a random number like 0? Or should I find a category number already in the db and set it to that?

Not so sturdy in Ruby, would this work you think?

        if categories.detect { |c| c["forumid"] == cc["parentid"] }["parentid"].nil?
          cc["parentid"] = 52
        else
          cc["parentid"] = categories.detect { |c| c["forumid"] == cc["parentid"] }["parentid"]
        end

Actually it seems there are lots of deleted forums who’s parentid no longer exists

EDIT
I just set them all to one parent topic, and I can fix it up later

1 Like

We’ve finally come to the attachment import part, it came to around 1.9% and now we get this error

    67406 / 3550728 (  1.9%)  Traceback (most recent call last):
        23: from script/import_scripts/vbulletin.rb:1006:in `<main>'
        22: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
        21: from script/import_scripts/vbulletin.rb:88:in `execute'
        20: from script/import_scripts/vbulletin.rb:610:in `import_attachments'
        19: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/querying.rb:22:in `find_each'
        18: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:70:in `find_each'
        17: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:137:in `find_in_batches'
        16: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:229:in `in_batches'
        15: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:229:in `loop'
        14: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:245:in `block in in_batches'
        13: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:138:in `block in find_in_batches'
        12: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:71:in `block in find_each'
        11: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:71:in `each'
        10: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/activerecord-6.1.4.1/lib/active_record/relation/batches.rb:71:in `block (2 levels) in find_each'
         9: from script/import_scripts/vbulletin.rb:651:in `block in import_attachments'
         8: from script/import_scripts/vbulletin.rb:651:in `each'
         7: from script/import_scripts/vbulletin.rb:659:in `block (2 levels) in import_attachments'
         6: from /var/www/discourse/script/import_scripts/base.rb:873:in `html_for_upload'
         5: from /var/www/discourse/script/import_scripts/base/uploader.rb:40:in `html_for_upload'
         4: from /var/www/discourse/lib/upload_markdown.rb:10:in `to_markdown'
         3: from /var/www/discourse/lib/upload_markdown.rb:19:in `image_markdown'
         2: from /var/www/discourse/app/models/upload.rb:206:in `short_url'
         1: from /var/www/discourse/app/models/upload.rb:534:in `short_url_basename'
/var/www/discourse/app/models/upload.rb:270:in `base62_sha1': undefined method `hex' for nil:NilClass (NoMethodError)

undefined method `hex’ for nil:NilClass (NoMethodError)

Anyone got any clues as to how to fix this one?

Is it trying to read short_url_basename, and it returns nil; so .hex fails?

1 Like

My guess, without looking at the code, is that the file is missing or maybe there’s a filename field and it is empty? I’d probably put a puts in import_attachments and see what is in the record it’s trying to import.

1 Like

Thank you for the help! I’m new to Ruby, would this be the correct way to do it?

      unless mapping[post.id].nil? || mapping[post.id].empty?
        mapping[post.id].each do |attachment_id|
          upload, filename = find_upload(post, attachment_id)
          unless upload
            fail_count += 1
            next
          end

          puts "{short_url_basename}"

          # internal upload deduplication will make sure that we do not import attachments again
          html = html_for_upload(upload, filename)
          if !new_raw[html]
            new_raw += "\n\n#{html}\n\n"
          end
        end
      end

Aha, short_url_basename is a function so that won’t work.

Is it simply, puts “{post}” ? And it will output all the contents of the post object?

This seems to be the line its crashing on in upload.rb

upload_markdown 19
"![#{@upload.original_filename}|#{@upload.width}x#{@upload.height}](#{@upload.short_url})"

upload.rb 534
"#{Upload.base62_sha1(sha1)}#{extension.present? ? ".#{extension}" : ""}"

upload.rb 270
Base62.encode(sha1.hex)

So its either upload.original_filename, upload.width, upload.height or upload.short_url then

So if I do a nil check in upload_markdown, it should prevent the error no?

Does it need the shortURL for it to work; could I just create my own random shortURL?

2 Likes

I think that’s where the issue is. It’s not finding the upload, so it comes back nil. Maybe the file is missing or invalid.

1 Like

But wouldn’t this then catch it?

unless upload
  fail_count += 1
  next
end

Or does unless not check for nil?

Or does it pass because it created the upload object, but the property upload.short_url in the upload object is missing perhaps?

1 Like

Sorry. Right. That would catch it. I’m afraid that’s why this level of debugging isn’t really appriate for a forum. :person_shrugging:

You’re on the right track, though. Just keep going. It seems lke you know enough to figure it out. I wrote at least a couple of importers before I learned Ruby.

1 Like