Migrate a XenForo forum to Discourse

In the forum I transferred, the xengallery was once installed, so I had to change the following, because the table xfgallery no longer existed.

  def get_xf_sql(type, id)
    case type
    when :gallery
      return "SELECT NULL WHERE 1=0;"
    when :attachment
      <<-SQL
        SELECT a.attachment_id, a.data_id, d.filename, d.file_hash, d.user_id
        FROM #{TABLE_PREFIX}attachment AS a
        INNER JOIN #{TABLE_PREFIX}attachment_data d ON a.data_id = d.data_id
        WHERE attachment_id = #{id}
        AND content_type = 'post'
      SQL
    end
  end
1 Like

You can try running the script again and see if it finishes.

Your system is on a single partition?

The mysql-server install step is now obsolete. It needs to install mariadb-server.

sudo apt-get install mariadb-server mariadb-client libmariadb-dev-compat libmariadb-dev

I was able to follow rest of the steps and import XF DB, so someone should update the guide.

Now I am stuck on this following step and need help.

echo "gem 'mysql2'" >>Gemfile
bundle install --no-deployment

Running the above gives me the following error. I checked the Gemfile and it only contains this one line - gem ‘mysql2’

This Gemfile does not include an explicit global source. 
Not using an explicit global source may result in a different lockfile being generated depending on the gems you have installed locally before bundler is run. 
Instead, define a global source in your Gemfile like this: source "https://rubygems.org".
Could not find gem 'mysql2' in locally installed gems.
root@ip-172-566-459-13-app:/# 

Ok so I managed to move onto the next step. Someone above posted that we need to be in /var/www/discourse folder on the container and then add the gem.

Now on the final step

RAILS_ENV=production bundle exec ruby script/import_scripts/xenforo.rb

I am getting this error. What could I be doing wrong?

/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-7.2.2.1/lib/active_record/connection_adapters/postgresql_adapter.rb:63:in "rescue in new_client": We could not find your database: discourse. Available database configurations can be found in config/database.yml. (ActiveRecord::NoDatabaseError)To resolve this error:- Did you not create the database, or did you delete it? To create the database, run: bin/rails db:create- Has the database name changed? Verify that config/database.yml contains the correct database name.

Solved it: I was running as root user, had to switch to the ‘discourse’ user. Import has started.

4 Likes

So I picked up a reasonably good server at 4CPU and 16GB RAM. At the rate at which the posts are getting migrated, it will take me 9 days for just the posts to get migrated. The users took 2.5 hours to get migrated. Safe to say that this is going to be a no go for me as is but at least I can spend some months familiarizing myself till I figure out a solution for this bulk migration.

PS:
In the migration script I see that duplicate emails are not imported. What are the different ways that duplicate is determined? I noticed that xyz@gmail.com is treated same as xyz+1@gmail.com and xy.z@gmail.com

Is there any other pattern as well.

I’ve tried doing migrations on VPS with specs similar to my personal computer, but for some reason it was always much, much slower than using my computer.

Nowadays, I always do my migrations locally. How many posts do you have?

1 Like

2.5 million posts.
Will try local migration on a M1 mac to compare.

1 Like

That’s pretty much it. The uniqueness check is performed on the downcased and normalized version of the given email address.

We normalize by removing all dots and ignoring everything after + in the username.

3 Likes

Single cpu speed is the important factor.

On my machines, a rate of 800-1000 users or posts/minute is fairly typical.

Note that when you do the final import, it’ll import only the users and posts that haven’t been imported already, so it won’t take very long.

Turn off the Normalize emails site setting (off was the default until recently). It should probably get turned off in this function here:

You can put it in your customized version of the xenforo script with SiteSetting.normalize_emails=false. I’m not sure what happened to those duplicate email users; there are two obvious things to do, give them a bogus email address or skip importing them. Looks like it gives them bogus ones? (And there’s a pretty good chance that they are, in fact, bogus users anyway). If it skipped them, then running the script again will import them.

3 Likes

Yes on my laptop, it is churning things much faster at 1000 items per minute. Thats about 2 times faster than the on server. Still thats about 3 days.

I went through the skipped emails and it seems its doing a good job ignoring those accounts. I will just merge them prior to the final import. Hardly 20 odd such cases.

Note that when you do the final import, it’ll import only the users and posts that haven’t been imported already, so it won’t take very long.

Thank you for pointing this out. I observed this myself and it seems this is what is going to save the day when I do the final import. So I take a backup and restore on D-3 and then another backup and restore with the new DB backup file on Day 0. Is that correct?

1 Like

Are those backups and restores on the Xenforo site, or do you have some live Discourse site that you’re going to import the Xenforo data to?

As long as you don’t make changes to the script that require re-importing data, and what you have on your laptop now is what you want on your Discourse server, then you can just keep getting new dumps of the Xenforo database and importing them (to test, see how long it takes, and so on) and then on the cut-over day, you freeze the Xenforo site, get that database, run the script once more and upload to your Discourse server.

If you already have data on your Discourse site that you want to keep, things are much more complicated since you’ll need to freeze that site, then get the Xenforo data and then proceed as described above.

1 Like

It’ll be a fresh install of Discourse so that makes it straightforward.

I have a decent amount of time at hand as I want to test migrations multiple times, familiarize myself with Discourse thoroughly, get all add-ons configured the way I want and maybe also get my hands dirty with some add-on customization myself.

What you’ve explained lifts one pain point off my chest completely as I thought I would have to figure out bulk imports too.

2 Likes

Have come back with a query, does the import script output any logs? My test import is stuck at 98.2% for a few hours.

Another thing I realized, if I restart the migration, it takes around 30 seconds to skip over a batch of 1000 posts. So effectively the speed is now 2000 items per minute. Not a significant improvement over the 1000 posts per minute for the first import, as even on the last import on the day of the cutover, it will take about a days time. 23 hours out of which will just be skipping already imported items.

Just what you see.

You should probably stop it and start it again.

Yes, it’ll skip all data that’s been imported already. And it does it much faster than 2000 posts/minute. I suspect you’ll see when you restart it now.

Thats what, I restarted and then made the above post. It is 2000 posts/minute. To be sure I tried it again.

1 Like

So managed to get the avatars and attachments imported. Copied these folders.

/internal_data/attachments
/data/avatars

To answer my question, the avatars and attachments get finalized once imported. If a user changes their avatar after their ID is imported, it will not get imported/updated because that post or user will get skipped in the second run.

Now just need to figure out the conversations import (can skip too but good to have) and permanent redirects.

@Fajfi - Thank you for your contribution to the import script. Worked flawlessly for avatars and attachments. Its still running and have not reached the likes portion yet.

Fixed the conversations import. Was able to import over half a million messages from XF2.3 into discourse. Have raised a PR in case someone is interested.

----EDIT----

Raise another PR with a fix for likes import. It is surprising that nobody migrated from XF2.1+ to discourse till now. Likes were renamed to reactions in 2019 when XF2.1 released.

5 Likes