Discourse already has around 40 importers in order to cover a wide range of community software.
These importers work very well, but they tend to be slow for very large forums.
That’s why we’ve built the bulk importers.
What is a bulk importer?
Our standard importers go through the same code paths as the application. This has the advantage of ensuring the imported data is consistent. But tends to be slow since it’s importing record by record…
In order to go faster, we need to import in bulk.
In order to import in bulk, we need to bypass Rails and use SQL.
This solution has 2 drawbacks
- We lose pretty much all the validations (since it’s done in Rails), but we can import 25 million posts in a couple of hours instead of a week
- We need to keep it up to date whenever we change the structure of the database
There’s not much we can do about #1 other than being careful to respect them in the importers.
For #2 we decided to split the code in 2 parts
- An importer script which will import the minimum viable content
- A rake task that is launched post-import in order to populate all the other required columns and tables
The importer will be responsible for importing the most important data that can’t be computed.
The rake task will be responsible for computing all the missing (but required) data and stats.
A bulk importer will only import
- groups (name, description)
- users (email, username, name, title, admin/moderator, status, date of birth)
- user passwords & salts (so they can re-use the same password)
- user profiles (location, website, description)
- categories (name, description)
- topics (title, user, category, status, type)
- posts (user, topic, raw, reply to post number, type, reads)
- post_actions (bookmarks, likes, flags)
- tags (name)
A bulk importer will not import
- posts revisions
- groups permissions
- categories permissions
- avatars (1)
- attachments (2)
(1) the script stores the avatar’s URLs in a custom field which can be used later to download the avatars
(2) downloading & manipulating files is easily the slowest part of the import, but we might add support for bulk importing attachments
When to use a bulk importer?
If you are planning to migrate a forum with more than 5 million posts to Discourse, then it is recommended to try our bulk importers.
We currently only support bulk importing from
vBulletin but are planning to support
XenForo as well.
How to bulk import?
- You need to have a working development environment of Discourse.
- The database of the forum you are importing should be running on the same machine for best performance
Fire up your terminal and go to the
Install the gem used by the importer
IMPORT=1 bundle install
Run the importer
You can change the locale by using the
LOCALE environment variable
LOCALE=fr ruby script/bulk_import/vbulletin.rb
You can also change the connection settings of the imported database
DB_HOST=localhost DB_USERNAME=user DB_PASSWORD=1234 DB_NAME=myforum ruby script/bulk_import/vbulletin.rb
Once the import is done, you need to run a rake task to generate all computed data and stats
Create a backup
Upload the backup to your production instance, enable restoring from a backup and restore your imported data