Migrating vBulletin 5 database - Import script errors

Ok so, quick recap.

I’m volunteering from the migration of a forum that is currently on vbulletin3.
On a staging environment, starting from a dump of the database (20GB, you read that right).

Run the upgrade to vBulletin 5. Took 5-6 hours but went through. Version is vBulletin 5.4.
Did some cleanup of the usernames to be accepted by discourse.

Now, installed the docker discourse and followed loosely this guide for the preparation. Loosely meaning that most of it was redundant or out of date but it helped getting some bearing about what to do.

I’m at the step in which I literally am going blind as I have near zero Ruby coding experience.
So, the relevant parts, after finishing the installation I’ve entered the container with ./launcher enter app then:

  • Added freetds-dev and libmariadb-dev
  • Edited the Gemfile to add php_serialize gem.
  • from shell, run export IMPORT=1 to set the environment for import
  • as discourse user run bundle install --no-deployment --without test --without development --path vendor/bundle

Got the error:

You are trying to install in deployment mode after changing
your Gemfile. Run `bundle install` elsewhere and add the
updated Gemfile.lock to version control.

If this is a development machine, remove the /var/www/discourse/Gemfile freeze
by running `bundle config unset deployment`.

The list of sources changed
The dependencies in your gemfile changed

You have added to the Gemfile:
* mysql2
* redcarpet
* php_serialize
* sqlite3 (~> 1.3, >= 1.3.13)
* ruby-bbcode-to-md
* reverse_markdown
* tiny_tds
* csv
* parallel

So, continue with

  • bundle config unset deployment and run again the previous command
  • checked that both mysql2 and php_serialize were there (they were)
  • added the old forum avatars (no attachments to import) and assigned the directories ownership to discourse user in its own /home/discourse
  • edited script/import_scripts/vbulletin5.rb to change the reference for connecting to the db
  • as user discourse run bundle exec ruby script/import_scripts/vbulletin5.rb

This returned to me an error about tzinfo Integer values not supported that I found being mentioned here on this discourse.

Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...

importing groups...
       41 / 41 (100.0%)  [2294 items/min]  ]
importing users
Traceback (most recent call last):
        15: from script/import_scripts/vbulletin5.rb:726:in `<main>'
        14: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
        13: from script/import_scripts/vbulletin5.rb:46:in `execute'
        12: from script/import_scripts/vbulletin5.rb:79:in `import_users'
        11: from /var/www/discourse/script/import_scripts/base.rb:916:in `batches'
        10: from /var/www/discourse/script/import_scripts/base.rb:916:in `loop'
         9: from /var/www/discourse/script/import_scripts/base.rb:917:in `block in batches'
         8: from script/import_scripts/vbulletin5.rb:98:in `block in import_users'
         7: from /var/www/discourse/script/import_scripts/base.rb:264:in `create_users'
         6: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/mysql2/alias_method.rb:8:in `each'
         5: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/mysql2/alias_method.rb:8:in `each'
         4: from /var/www/discourse/script/import_scripts/base.rb:265:in `block in create_users'
         3: from script/import_scripts/vbulletin5.rb:110:in `block (2 levels) in import_users'
         2: from script/import_scripts/vbulletin5.rb:718:in `parse_timestamp'
         1: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/tzinfo-2.0.5/lib/tzinfo/timezone.rb:575:in `utc_to_local'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/tzinfo-2.0.5/lib/tzinfo/timestamp.rb:138:in `for': Integer values are not supported (ArgumentError)

@Haddoq suggestion was to change a line from Time.zone.at(@tz.utc_to_local(timestamp)) into Time.zone.at(timestamp).

It also suggest to add lastvisit in the user query because it will cause another error otherwise so I did that as well.

However, now when I launch the migration with bundle exec ruby script/import_scripts/vbulletin5.rb this is what I get:

Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...

importing groups...
       41 / 41 (100.0%)  [120217 items/min]
importing users
Traceback (most recent call last):
        13: from script/import_scripts/vbulletin5.rb:727:in `<main>'
        12: from /var/www/discourse/script/import_scripts/base.rb:47:in `perform'
        11: from script/import_scripts/vbulletin5.rb:46:in `execute'
        10: from script/import_scripts/vbulletin5.rb:79:in `import_users'
         9: from /var/www/discourse/script/import_scripts/base.rb:916:in `batches'
         8: from /var/www/discourse/script/import_scripts/base.rb:916:in `loop'
         7: from /var/www/discourse/script/import_scripts/base.rb:917:in `block in batches'
         6: from script/import_scripts/vbulletin5.rb:80:in `block in import_users'
         5: from script/import_scripts/vbulletin5.rb:723:in `mysql_query'
         4: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/mysql2/alias_method.rb:22:in `query'
         3: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/mysql2-0.5.4/lib/mysql2/client.rb:147:in `query'
         2: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/mysql2-0.5.4/lib/mysql2/client.rb:147:in `handle_interrupt'
         1: from /var/www/discourse/vendor/bundle/ruby/2.7.0/gems/mysql2-0.5.4/lib/mysql2/client.rb:148:in `block in query'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/mysql2-0.5.4/lib/mysql2/client.rb:148:in `_query': You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'CASE WHEN u.scheme='blowfish:10' THEN token (Mysql2::Error)
                 WHEN u.scheme='lega' at line 2

At this point I’m a little lost. Can someone help?

paging @Canapin as he’s been in “Migration Vietnam” and might know more :heart:

Turn out working on something 8+ hours straight is bad.

image

When adding u.lastvisit at the end of SELECT u.userid, u.username, u.homepage, u.usertitle, u.usergroupid, u.joindate, u.email, I forgot to add a , after that.

Sorry Canapin for the ping :frowning:

1 Like

Alright, follow up issue. The guide I was following stated that the registration process could be restarted if it slows down.

But when restarted I get errors about users already existing in the postgres db.

Silly me, I went and deleted all users with id > 1 in the db (leaving the discobot, system and admin basically) and restarting. That makes the import continue but all previously created users’s email are marked as “already used” somewhere.

What can I do to clean it up and shouldn’t this process instead skip insertion if there is already a username matching?

Edit: Alright, found out that I need to clear users, email_tokens and user_emails.

1 Like

Unfortunately, I didn’t keep track of what I’ve been tweaking in my previous imports so I have little knowledge. I now have a personal Discourse instance where I write this kind of stuff… I should have done that before!
I’m more skilled in import stuff when I’m actually working on one.

As for destroying users, you may have wanted to use UserDestroyer through the rails console:

2 Likes

Great stuff, I’ll keep it in mind for the “real” migration. For now i’m just doing a test run of the whole process while writing a runbook :slight_smile:

1 Like

Why did you do that? Unless you changed something in the script that would import those differently then you should just restart and have it import those out didn’t import already.

If you do need to start over, it’s much easier to restore a backup or drop and create a new database.

It should find the import ids in the UserCustomFields table. I’m not quite sure how you’d get that error.

The changes are the one I listed. I am as surprised as you are but the script would not progress and simply error and stop.

Is there any way to speed up the import?

I’ve over 90k users in this community and the speed of import degrade over time for some reason and I can’t imagine why.

It run all night and just for the users we are at 25123 / 95635 ( 26.3%) [42 items/min]

There are several order of magnitude more posts. How long should I expect a migration to happen? Days? weeks?

How much ram? That’s likely the issue. You can stop and restart.

I’ve had them take weeks. That’s why the bulk importers exist.

It has just 2GB ram. It’s a test machine. I could run it locally instead of a vm (16GB of ram would be that much better?) and then package everything and upload eventually I guess.

Can you elaborate on bulk importers? first time I hear about it and it definitely should have poppup up when I was searching for “migrate vbulletin to discourse” on google :angry:

Frustrated Jason Segel GIF by NETFLIX

Can I run this even if the other importer already processed some users or should I clean up?

Went ahead and tried, at worst it won’t work. I’m getting spammed with

ERROR: no implicit conversion of nil into String
/var/www/discourse/script/bulk_import/base.rb:861:in `encode'
/var/www/discourse/script/bulk_import/base.rb:861:in `normalize_charset'
/var/www/discourse/script/bulk_import/base.rb:856:in `normalize_text'
script/bulk_import/vbulletin5.rb:123:in `block in import_users'
/var/www/discourse/script/bulk_import/base.rb:725:in `block (2 levels) in create_records'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/mysql2/alias_method.rb:8:in `each'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/mysql2/alias_method.rb:8:in `each'
/var/www/discourse/script/bulk_import/base.rb:723:in `block in create_records'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/pg-1.4.5/lib/pg/connection.rb:196:in `copy_data'
/var/www/discourse/script/bulk_import/base.rb:722:in `create_records'
/var/www/discourse/script/bulk_import/base.rb:340:in `create_users'
script/bulk_import/vbulletin5.rb:120:in `import_users'
script/bulk_import/vbulletin5.rb:63:in `execute'
/var/www/discourse/script/bulk_import/base.rb:100:in `run'
script/bulk_import/vbulletin5.rb:781:in `<main>'

I assume I should clean up users and groups which are the only things the other importer created/started to create.

Before I go mess in the db again, is there some ruby command I can run that will take care of that in a clean way?

For this time I’ll just go ahead and destroy and recreate the discourse installation. Thanks heaven for Docker.

Nope, clean installation, the script still fail with this generic

ERROR: no implicit conversion of nil into String
/var/www/discourse/script/bulk_import/base.rb:861:in `encode'
/var/www/discourse/script/bulk_import/base.rb:861:in `normalize_charset'
/var/www/discourse/script/bulk_import/base.rb:856:in `normalize_text'
script/bulk_import/vbulletin5.rb:123:in `block in import_users'
/var/www/discourse/script/bulk_import/base.rb:725:in `block (2 levels) in create_records'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/mysql2/alias_method.rb:8:in `each'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/mysql2/alias_method.rb:8:in `each'
/var/www/discourse/script/bulk_import/base.rb:723:in `block in create_records'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/pg-1.4.5/lib/pg/connection.rb:196:in `copy_data'
/var/www/discourse/script/bulk_import/base.rb:722:in `create_records'
/var/www/discourse/script/bulk_import/base.rb:340:in `create_users'
script/bulk_import/vbulletin5.rb:120:in `import_users'
script/bulk_import/vbulletin5.rb:63:in `execute'
/var/www/discourse/script/bulk_import/base.rb:100:in `run'
script/bulk_import/vbulletin5.rb:781:in `<main>'

Is there any way to debug this so that at least I can understand which value is being nil and is expected to be something else?

Ok, error seems to be related to the encoding.
In the bulk_import/vbulletin5.rb I can specify the encoding (in our case, UTF8mb4 but in the base.rb file it doesn’t seems to map to anything in the charset map

Hi!

Some general tips:

  1. if you do changes to the script, it is generally advisable to start from scratch or at least a known good point. You can most easily do that by restoring a backup taken just before you run the migration like @pfaffman said.
  2. bulk importers will generally take a lot less time, but this particular one has the potential to consume YUGE amounts of RAM because it caches stuff in memory. For one particular vBulletin bulk migration I did from a 2GB SQL file uncompressed, the process needed 22GB RAM (double checked, not a typo)
  3. if you do changes to the script, I suggest you create a test version of the input with say 100 or 1000 records for each table (but watch out for referential integrity - ie. don’t truncate tables referenced by other tables). Testing changes with a >8hs process will erode your sanity very quickly

A more specific tip about stack traces: look for lines mentioning the specific file you ran. In this case yes it’s an encoding issue, but more relevant is the fact that it’s about usernames:

You said you had to sanitize usernames so I would double check that you encoded them as expected by the script.

2 Likes

You can just drop , create, migrate the database rather than recreate Discourse. It’s a bit tricky, though, as you have to

  sv stop unicorn

and then

  rake db:drop db:create db:migrate

It’ll complain and tell you to set and ENV variable that you need to put on the line before the rake task.

You can also restore a backup, which can be more convenient.

FWIW, I don’t think I’ve ever used a bulk migration script.

1 Like

That’s even weirder as the usernames have been all sanitized to follow discorse guidelines, basically they have all been changed to be letters, numbers or _, nothing else.

In any case, changing the charset from utf8mb4 to utf8 as it was the default made it go through but then I’m getting errors for invalid emails

ERROR: can't modify frozen String: "24ef401b30f5161e5a0bb27ec49ed921@email.invalid"
/var/www/discourse/script/bulk_import/base.rb:457:in `downcase!'
/var/www/discourse/script/bulk_import/base.rb:457:in `process_user_email'
/var/www/discourse/script/bulk_import/base.rb:726:in `block (2 levels) in create_records'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/mysql2/alias_method.rb:8:in `each'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/rack-mini-profiler-3.0.0/lib/patches/db/mysql2/alias_method.rb:8:in `each'
/var/www/discourse/script/bulk_import/base.rb:723:in `block in create_records'
/var/www/discourse/vendor/bundle/ruby/2.7.0/gems/pg-1.4.5/lib/pg/connection.rb:196:in `copy_data'
/var/www/discourse/script/bulk_import/base.rb:722:in `create_records'
/var/www/discourse/script/bulk_import/base.rb:351:in `create_user_emails'
script/bulk_import/vbulletin5.rb:151:in `import_user_emails'
script/bulk_import/vbulletin5.rb:66:in `execute'
/var/www/discourse/script/bulk_import/base.rb:100:in `run'
script/bulk_import/vbulletin5.rb:781:in `<main>'

Going now to discover what is this about as the “non-bulk” imported was detecting some malformed email but replacing them automatically.

After you have done so, just create a backup of your empty pristine Discourse installation.
You can always restore that very quickly and start fresh again.

2 Likes

To add to that, if you have trouble with for instance the import of posts, you could also exit the script after the user import has completed satisfactory and create another backup. Then you can restart the script and you can resume at the moment where you have already imported the users.

1 Like