Migrate a vBulletin 4 forum to Discourse

Thanks for the valuable info! :+1:t6:

I have another question, about the SQL query that select the vBulletin users.

When I imported my old phpbb forum to Discourse 3 years ago, where were about 20000 users .
Obviously, most of the users were unusued accounts. During these 3 years, Discourse has made his own clean up of inactive users, and we have now a more honest number of 3000 members.

When I imported my 180000 users from vBulletin and asked Discourse to aggressively do its cleanup job, I was left with 27000 users, which seems fair.

On my vBulletin, since:

  1. All messages posted by users on other usersā€™s profiles are imported to Discourse to uncategorized, titleless topics that add nothing but useless noise, and
  2. The vast majority of users that posted only on other usersā€™ profiles seems to be spammers,
    Iā€™d like to do the cleanup during the import, and not after.

I donā€™t understand all of the vBulletin database, which is a bit confusing with its nodes thing, but Iā€™d like to import only users that have posted at least 1 topic or reply.
It seems to me that topics and replies fill the lastpostid field in the vBulletin user table, but public profile posts donā€™t.

Si, I intend to edit

  SELECT u.userid, u.username, u.homepage, u.usertitle, u.usergroupid, u.joindate, u.email
    CASE WHEN u.scheme='blowfish:10' THEN token
         WHEN u.scheme='legacy' THEN REPLACE(token, ' ', ':')
    END AS password,
    IF(ug.title = 'Administrators', 1, 0) AS admin
    FROM #{DB_PREFIX}user u
    LEFT JOIN #{DB_PREFIX}usergroup ug ON ug.usergroupid = u.usergroupid
ORDER BY userid
   LIMIT #{BATCH_SIZE}
  OFFSET #{offset}

to:

  SELECT u.userid, u.username, u.homepage, u.usertitle, u.usergroupid, u.joindate, u.email, u.lastpost
    CASE WHEN u.scheme='blowfish:10' THEN token
         WHEN u.scheme='legacy' THEN REPLACE(token, ' ', ':')
    END AS password,
    IF(ug.title = 'Administrators', 1, 0) AS admin
    FROM #{DB_PREFIX}user u
    LEFT JOIN #{DB_PREFIX}usergroup ug ON ug.usergroupid = u.usergroupid
    WHERE u.lastpost > 0
ORDER BY userid
   LIMIT #{BATCH_SIZE}
  OFFSET #{offset}

I just add a WHERE u.lastpost > 0.

A count with this query gives me 25000 users, compared to the 27000 active users on my previous Import, after a Discourse cleanup but still having these titleless topics (former profile public messages). That would mean that about 2000 users would have posted only on other usersā€™ profiles which is not an absurd number.

Do you think my reasoning is right and that adding WHERE u.lastpost > 0 will nicely clean up my user base without having any harmful effect?

It depends where your database has been. If it was migrated from phpBB to vBulletin or if it has gone through many vBulletin updates then this reasoning could be wrong.

The best you can do is to verify your reasoning by running more queries, for instance by listing all posts made by the users without a lastpost.

Also, if you have any plugins like ā€œlikesā€ or ā€œvotingā€ you might be removing users that you should not be removing.

My strategy is to be very conservative in removing or leaving out things. Storage is inexpensive.

4 Likes

Thank you, Iā€™ll do as you say, I prefer to be careful. :slight_smile:
Iā€™ll let Discourse do its own cleanup over time.

Iā€™ve been adding custom posts cleanup regex in your migration script for days because the forum is very old and was migrated from phpbb before vbulletin, so many things had to be taken care of, but I think and I hope Iā€™m close to finalizing things.

I donā€™t really know vbulletin, but the forum Iā€™m working on was using vBulletin 5.6 and some external images I had posted in it were using this syntax in the vbulletin database:
[IMG2=JSON]{"data-align":"none","data-size":"full","src":"https:\/\/forum.monocycle.info\/uploads\/default\/original\/2X\/3\/396192845ba93e7df2a6109a2608072efa21ee32.jpeg"}[/IMG2]
Iā€™m not sure is this is something that was ā€œforgottenā€ in your script, or if the forum admin used some plugin generating these img2 tags.

Anyway, I fixed these with this code:

    raw = raw.gsub(/\[img2=json\].+?(http.+?).}\[\/img2\]/i) {"\n#{$1}\n"}

I have a question though: will discourse rebake the imported posts automatically over time? If it does, will it start with the most recent posts?

2 Likes

Hi again,
My forum has about 1000 tags, but we probably wonā€™t use them on Discourse. Plus, these tags are probably a real mess. Can I just comment this line in the importer:

    import_tags

Or there may be some collateral damage?

1 Like

You can safely comment it out.

2 Likes

Is it safe not waiting for sidekiq to finish his jobs after importing something?

I imported my users and this is sidekiqā€™s current state.

What happens to these tasks if I create a backup and I restore it on a production forum?

1 Like

No, although a full rebake will recreate most of these jobs, I would absolutely recommend you to wait.

They will keep on runningā€¦ on the import instance.
They will not be included in the backup or transferred to the production forum.

3 Likes

Thanks ! :+1:

Because of some error messages during backup restores (that donā€™t prevent the restore from finishing or the forum from working), I was wondering if it could be because I didnā€™t wait for sidekiq to finish his job.

I started a new import from vbulletin: I only imported groups and 30000 users on my dev discourse, waited a few dozen of minutes, then created a backup, restored the backup on a docker based installation. The restore worked, but the logs show these errors

ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR: relation "users" does not exist LINE 1: SELECT "users".* FROM "users" WHERE "users"."id" = 1 LIMIT 1 ^ ) lib/a
10:03 pm
7
ActiveRecord::StatementInvalid (PG::UndefinedTable: ERROR: relation "user_auth_tokens" does not exist LINE 1: SELECT "user_auth_tokens".* FROM "user_auth_tokens" WHERE ((...

They are inconsistent, and the relations errors differ from a backup to another.
I canā€™t figure out where do these come from. :confused:

1 Like

TL;DR I donā€™t think those errors are related to the import in any way.

Iā€™ve seen that before, I think it is a race condition that is happening because the database is being accessed while the backup is restoring.

This could be because of regular traffic coming in at your server, or a Sidekiq process that hasnā€™t been paused. In both cases it is harmless. If I were you I would completely ignore all PG errors that occur before the restore has been finished completely.

4 Likes

Thatā€™s reassuring!

Thing is, not only the messages themselves are a bit scary to me, they are also scary because

  1. they happen for every (or almost? :thinking:) backup Iā€™ve created since I started my import, whether I restore the backup in a local dev forum or a docker online install
  2. they prevent the restore log (in Discourse backup interface) to continue being written in the Discourse interface while itā€™s doing the restoration : it stays stuck on ā€œunzipping archiveā€ (or ā€œCreating missing functions in the discourse_functions schemaā€¦ā€ if itā€™s a backup without uploads).
    It looks like something has crashed, but if I wait, after a while Iā€™m always properly and automatically logged out and when, I log in again, the import seems to have worked well besides these error messages.

Since the forum is working (besides the category edition that leads to a 502 error I described in another thread), Iā€™m just afraid that the forum would workā€¦ Until it wouldnā€™t for some reason in a few weeks/months/years because of something I didnā€™t figure out first, which I really donā€™t want to happen, especially since Iā€™ve been working on this import everyday for 1 month and counting. :sweat_smile:

Anyway, thanks for your help, itā€™s much appreciated. Iā€™m putting a lot of energy in this unpaid import work for a large community, and having people replying to my questions is always relieving.

2 Likes

First off thanks for posting this. Just tried importing vBulletin 3.7.x. Followed all the steps but then when I ran the import script it says it canā€™t connect even though I can connect. Any ideas?

root@uat-app:/var/www/discourse# export DB_NAME="vb4"
root@uat-app:/var/www/discourse# export DB_USER="root"
root@uat-app:/var/www/discourse# export DB_PW="1234"
root@uat-app:/var/www/discourse# export TABLE_PREFIX="vbulletin"
root@uat-app:/var/www/discourse# export ATTACHMENT_DIR='/shared/vbulletin'
root@uat-app:/var/www/discourse# export TIMEZONE="America/Vancouver"
root@uat-app:/var/www/discourse# su discourse -c 'bundle exec ruby script/import_scripts/vbulletin.rb'
root:1234@localhost wants vb4
Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...
==================================================
Access denied for user 'root'@'localhost'
Cannot connect in to database.

Hostname: localhost
Username: root
Password: 1234
database: vb4

Edit the script or set these environment variables:

export DB_HOST="localhost"
export DB_NAME="vbulletin"
export DB_PW=""
export DB_USER="root"
export TABLE_PREFIX="vb_"
export ATTACHMENT_DIR '/path/to/your/attachment/folder'

Exiting.

Below you can see I am able to login to the database using the same credentials and validated the database name and table prefix.

root@uat-app:/var/www/discourse# mysql -uroot -p1234 -hlocalhost
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 70
Server version: 10.3.25-MariaDB-0+deb10u1 Debian 10

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> use vb4;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
MariaDB [vb4]> select * from information_schema.tables where table_schema = 'vb4' and table_name = 'vbulletinpost'
    -> ;
+---------------+--------------+---------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------------+--------------------+-----------+
| TABLE_CATALOG | TABLE_SCHEMA | TABLE_NAME    | TABLE_TYPE | ENGINE | VERSION | ROW_FORMAT | TABLE_ROWS | AVG_ROW_LENGTH | DATA_LENGTH | MAX_DATA_LENGTH | INDEX_LENGTH | DATA_FREE | AUTO_INCREMENT | CREATE_TIME         | UPDATE_TIME         | CHECK_TIME          | TABLE_COLLATION   | CHECKSUM | CREATE_OPTIONS | TABLE_COMMENT | MAX_INDEX_LENGTH   | TEMPORARY |
+---------------+--------------+---------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------------+--------------------+-----------+
| def           | vb4          | vbulletinpost | BASE TABLE | MyISAM |      10 | Dynamic    |    1037509 |            356 |   370191960 | 281474976710655 |     53087232 |         0 |        1046506 | 2020-11-14 14:27:01 | 2020-11-14 14:27:28 | 2020-11-14 14:27:32 | latin1_swedish_ci |     NULL |                |               | 288230376151710720 | N         |
+---------------+--------------+---------------+------------+--------+---------+------------+------------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+---------------------+-------------------+----------+----------------+---------------+--------------------+-----------+
1 row in set (0.001 sec)

MariaDB [vb4]>

Also I have exported the browsed to /admincp/avatar.php?do=storage and exported the avatars into three different folders like this.

/shared/vbulletin/signaturepics/
/shared/vbulletin/customprofilepics/
/shared/vbulletin/customavatars/

Do I then specify the parent folder like this? Or do I have to take the content of the three folders and put them into one?

export ATTACHMENT_DIR='/shared/vbulletin'

1 Like

My only guess is maybe your password has funny characters in it?

2 Likes

Actually itā€™s nothing other than alpha and numeric characters. Anyway just tried again and confirmed doesnā€™t work, until I explicitly set the password. I also noticed the instructions need to be updated so Iā€™m pasting here what works for me.

1 Like

Instructions need to be updated. Hereā€™s what works for me as of Nov 2020. Note it is indeed better to run this import using screen because itā€™d be hours to do an import, and using nohup is likely not helpful because the import script will constantly be updating number of each item imported so the stdout file will likely be large.

Install Database To Host vBulletin Data

Download Latest Packages

Note MySQL is no longer available unless Oracle MySQL repo is added explicitly into the repo list. MariaDB has replaced MySQL.

root@uat-app:~# apt-get update
root@uat-app:~# apt-get install libmariadb-dev
root@uat-app:~# apt-get install default-mysql-server

Start Database

root@uat-app:~# service mysql status
[info] MariaDB is stopped..
root@uat-app:~#
root@uat-app:~# service mysql start
[ ok ] Starting MariaDB database server: mysqld.
root@uat-app:~# service mysql status
[info] /usr/bin/mysqladmin Ver 9.1 Distrib 10.3.25-MariaDB, for debian-linux-gnu on x86_64
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Server version 10.3.25-MariaDB-0+deb10u1
Protocol version 10
Connection Localhost via UNIX socket
UNIX socket /var/run/mysqld/mysqld.sock
Uptime: 4 sec

Threads: 7 Questions: 461 Slow queries: 0 Opens: 177 Flush tables: 1 Open tables: 31 Queries per second avg: 115.250.

Install Gems For Database Connectivity

Following shows that the latest ā€˜bundleā€™ doesnā€™t like some of the flags in the original instructions and there is a need to unset ā€˜deploymentā€™ mode.

root@uat-app:~# echo "gem 'mysql2', require: false" >> /var/www/discourse/Gemfile

root@uat-app:~# echo "gem 'php_serialize', require: false" >> /var/www/discourse/Gemfile

root@uat-app:~# cd /var/www/discourse
root@uat-app:/var/www/discourse# su discourse -c 'bundle install --no-deployment --without test --without development --path vendor/bundle'
[DEPRECATED] The `--path` flag is deprecated because it relies on being remembered across bundler invocations, which bundler will no longer do in future versions. Instead please use `bundle config set path 'vendor/bundle'`, and stop using this flag
[DEPRECATED] The `--without` flag is deprecated because it relies on being remembered across bundler invocations, which bundler will no longer do in future versions. Instead please use `bundle config set without 'development'`, and stop using this flag
You are trying to install in deployment mode after changing
your Gemfile. Run `bundle install` elsewhere and add the
updated Gemfile.lock to version control.

If this is a development machine, remove the /var/www/discourse/Gemfile freeze by running `bundle config unset deployment`.

The dependencies in your gemfile changed

You have added to the Gemfile:
* mysql2
* php_serialize

Update Configuration and Rerun Install

Check By CLI

Checking configuration confirmed that it is set to ā€˜deploymentā€™ mode.

root@uat-app:/var/www/discourse# bundle config list
Settings are listed in order of priority. The top value will be used.
deployment
Set for your local app (/var/www/discourse/.bundle/config): true

jobs
Set for your local app (/var/www/discourse/.bundle/config): 4

retry
Set for your local app (/var/www/discourse/.bundle/config): 3

path
Set for your local app (/var/www/discourse/.bundle/config): "vendor/bundle"

without
Set for your local app (/var/www/discourse/.bundle/config): [:development, :test]

Check By Inspecting Config File

Following is doing the same check by inspecting the config file.

root@uat-app:/var/www/discourse# cat /var/www/discourse/.bundle/config
---
BUNDLE_DEPLOYMENT: "true"
BUNDLE_JOBS: "4"
BUNDLE_RETRY: "3"
BUNDLE_PATH: "vendor/bundle"
BUNDLE_WITHOUT: "development:test"

Update Configuration

root@uat-app:/var/www/discourse# bundle config set path 'vendor/bundle'
Your application has set path to "vendor/bundle". This will override the global value you are currently setting
root@uat-app:/var/www/discourse# bundle config set without 'development:test'
Your application has set without to "development:test". This will override the global value you are currently setting
root@uat-app:/var/www/discourse# bundle config unset deployment

Validate Configuration Again

root@uat-app:/var/www/discourse# bundle config list
Settings are listed in order of priority. The top value will be used.
path
Set for your local app (/var/www/discourse/.bundle/config): "vendor/bundle"
Set for the current user (/root/.bundle/config): "vendor/bundle"

without
Set for your local app (/var/www/discourse/.bundle/config): [:development, :test]
Set for the current user (/root/.bundle/config): [:development, :test]

jobs
Set for your local app (/var/www/discourse/.bundle/config): 4

retry
Set for your local app (/var/www/discourse/.bundle/config): 3

Attempt Install Again

Run install again for the Gems and exit the container.

root@uat-app:/var/www/discourse# su discourse -c 'bundle install'
...........
Bundle complete! 125 Gemfile dependencies, 163 gems now installed.
Gems in the groups development and test were not installed.
Bundled gems are installed into `./vendor/bundle`
root@uat-app:/var/www/discourse# exit

Create Directory For vBulletin Data

Create Directory

[root@uat standalone]# pwd
/var/discourse/shared/standalone
[root@uat standalone]# mkdir vbulletin

Copy vBulletin Database

[root@uat standalone]# scp <login user>@<vbulletin server IP>:/home/backup/vbulletin/vbulletin-2020-11-14-03:30:01.sql.bz2 ./vbulletin/.

Unzip vBulletin Database

[root@uat containers]# docker exec -it app bash
root@uat-app:/# cd /shared/vbulletin
root@uat-app:/shared/vbulletin# bunzip2 vbulletin-2020-11-14-03\:30\:01.sql.bz2

Setup Data Source

Create Database vb4

root@uat-app:/shared/vbulletin# mysql -uroot -p -e 'CREATE DATABASE vb4'
Enter password:

Import vBulletin Into MariaDB

root@uat-app:/shared/vbulletin# mysql -uroot -p vb4 < vbulletin-2020-11-14-03\:30\:01.sql
Enter password:

Unzip Profile Archives

[root@uat vbulletin]# tar xvfz signaturepics.tar.gz
[root@uat vbulletin]# tar xvfz customavatars.tar.gz
[root@uat vbulletin]# tar xvfz customprofilepics.tar.gz

Update Database Root Password

root@uat-app:/var/www/discourse# mysql -uroot -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 77
Server version: 10.3.25-MariaDB-0+deb10u1 Debian 10

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> ALTER USER 'root'@'localhost' IDENTIFIED BY '1234';
Query OK, 0 rows affected (0.001 sec)

MariaDB [(none)]> quit
Bye

Import Into Discourse

Set Data Source Connection Details

[root@uat vbulletin]# export DB_NAME="vb4"
[root@uat vbulletin]# export DB_USER="root"
[root@uat vbulletin]# export DB_PW="1234"
[root@uat vbulletin]# export TABLE_PREFIX="vbulletin"
[root@uat vbulletin]# export ATTACHMENT_DIR='/shared/vbulletin'
[root@uat vbulletin]# export TIMEZONE="America/Vancouver"
[root@uat vbulletin]# cd /var/www/discourse
root@uat-app:/var/www/discourse# su discourse -c 'bundle exec ruby script/import_scripts/vbulletin.rb'
root:1234@localhost wants vb4
Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...

importing groups...
15 / 15 (100.0%) [3272 items/min] n]
importing users
117 / 11033 ( 1.1%) [145 items/min] in]
3 Likes

So the problem with your initial connection to the database was mixing libraries?

2 Likes

I donā€™t think so. I literally just changed the password explicitly and it was able to run. I am having a problem when importing private messages now. I am seeing a lot of the following. Any idea what this is?

importing private messages...
      139 / 177409 (  0.1%)  [399 items/min]  one of the participant's id is nil -- [nil, 270]
pm-149 has no target (a:1:{i:486;s:5:"TonyN";})
      364 / 177409 (  0.2%)  [418 items/min]  one of the participant's id is nil -- [nil, 276]
pm-420 has no target (a:1:{i:623;s:14:"the other side";})
      571 / 177409 (  0.3%)  [414 items/min]  one of the participant's id is nil -- [nil, 445]
pm-702 has no target (a:1:{i:767;s:6:"greatg";})
      572 / 177409 (  0.3%)  [414 items/min]  one of the participant's id is nil -- [nil, 445]
      605 / 177409 (  0.3%)  [416 items/min]  one of the participant's id is nil -- [nil, 461]
1 Like

Itā€™s either that those users didnā€™t get imported for some reason (missing email address used to a cause, but that should be resolved now), or for some other reason the code that looks up imported usernames isnā€™t working (case of the username, perhaps?).

3 Likes

@pfaffman yeah it does look that way although itā€™s not clear on some of the specifics. Letā€™s look at the first one for example.

  1. What does pm-149 mean?
  2. For a:1:{i:486;s:5:"TonyN";} the text ā€œTonyNā€ looks like the username but what about the other numbers?
  3. How about [nil, 270]? What does 270 signify?

If I can understand what itā€™s complaining I can at least try to look up the database to see if there are any data problems. But Iā€™m not sure what these really mean.

BTW I also noticed that all imported forums have permission for everyone. This means all the forums permitted only for moderators were set to be everyone visible. Is there a way to control this?

1 Like

Sorry. I donā€™t remember well enough to explain. This is about all of the free help I have to offer on this one.

Of course. See How to use category security settings to create private categories - faq - Discourse Meta

Some importers endeavor to import groups, but few of them know how to apply those permissions to the categories that get imported. Youā€™ll need to fix those up by hand.

2 Likes

Following @titusc 's instructions, and I seem to be having issues importing the databaseā€¦

root@DO-Discourse-app:/shared/vbulletin# mysql -uroot -p vb4 < CC12-Sat-Full-Backup.sql
Enter password: 
ERROR 1265 (01000) at line 4928: Data truncated for column 'method' at row 1
root@DO-Discourse-app:/shared/vbulletin#

Any suggestions on what itā€™s looking for?

N/M Itā€™s errors in the original databaseā€¦

2 Likes