Migrate a XenForo forum to Discourse

Tealk · September 24, 2024, 3:45pm

In the forum I transferred, the xengallery was once installed, so I had to change the following, because the table xfgallery no longer existed.

  def get_xf_sql(type, id)
    case type
    when :gallery
      return "SELECT NULL WHERE 1=0;"
    when :attachment
      <<-SQL
        SELECT a.attachment_id, a.data_id, d.filename, d.file_hash, d.user_id
        FROM #{TABLE_PREFIX}attachment AS a
        INNER JOIN #{TABLE_PREFIX}attachment_data d ON a.data_id = d.data_id
        WHERE attachment_id = #{id}
        AND content_type = 'post'
      SQL
    end
  end

pfaffman · September 27, 2024, 11:24am

You can try running the script again and see if it finishes.

Your system is on a single partition?

SubStrider · February 16, 2025, 5:28am

The mysql-server install step is now obsolete. It needs to install mariadb-server.

sudo apt-get install mariadb-server mariadb-client libmariadb-dev-compat libmariadb-dev

I was able to follow rest of the steps and import XF DB, so someone should update the guide.

Now I am stuck on this following step and need help.

echo "gem 'mysql2'" >>Gemfile
bundle install --no-deployment

Running the above gives me the following error. I checked the Gemfile and it only contains this one line - gem ‘mysql2’

This Gemfile does not include an explicit global source. 
Not using an explicit global source may result in a different lockfile being generated depending on the gems you have installed locally before bundler is run. 
Instead, define a global source in your Gemfile like this: source "https://rubygems.org".
Could not find gem 'mysql2' in locally installed gems.
root@ip-172-566-459-13-app:/#

SubStrider · February 16, 2025, 1:12pm

Ok so I managed to move onto the next step. Someone above posted that we need to be in /var/www/discourse folder on the container and then add the gem.

Now on the final step

RAILS_ENV=production bundle exec ruby script/import_scripts/xenforo.rb

I am getting this error. What could I be doing wrong?

/var/www/discourse/vendor/bundle/ruby/3.3.0/gems/activerecord-7.2.2.1/lib/active_record/connection_adapters/postgresql_adapter.rb:63:in "rescue in new_client": We could not find your database: discourse. Available database configurations can be found in config/database.yml. (ActiveRecord::NoDatabaseError)To resolve this error:- Did you not create the database, or did you delete it? To create the database, run: bin/rails db:create- Has the database name changed? Verify that config/database.yml contains the correct database name.

Solved it: I was running as root user, had to switch to the ‘discourse’ user. Import has started.

SubStrider · February 17, 2025, 8:16am

So I picked up a reasonably good server at 4CPU and 16GB RAM. At the rate at which the posts are getting migrated, it will take me 9 days for just the posts to get migrated. The users took 2.5 hours to get migrated. Safe to say that this is going to be a no go for me as is but at least I can spend some months familiarizing myself till I figure out a solution for this bulk migration.

PS:
In the migration script I see that duplicate emails are not imported. What are the different ways that duplicate is determined? I noticed that xyz@gmail.com is treated same as xyz+1@gmail.com and xy.z@gmail.com

Is there any other pattern as well.

Canapin · February 17, 2025, 12:41pm

I’ve tried doing migrations on VPS with specs similar to my personal computer, but for some reason it was always much, much slower than using my computer.

Nowadays, I always do my migrations locally. How many posts do you have?

SubStrider · February 17, 2025, 12:46pm

2.5 million posts.
Will try local migration on a M1 mac to compare.

selase · February 17, 2025, 4:22pm

That’s pretty much it. The uniqueness check is performed on the downcased and normalized version of the given email address.

github.com/discourse/discourse

app/models/user_email.rb

402ec6bf5


      
          def unique_email
            email_exists =
              if self.normalize_emails?
                self
                  .class
                  .where("lower(email) = ? OR lower(normalized_email) = ?", email, normalized_email)
                  .exists?
              else
                self.class.where("lower(email) = ?", email).exists?
              end
          
            self.errors.add(:email, :taken) if email_exists
          end

We normalize by removing all dots and ignoring everything after + in the username.

github.com/discourse/discourse

app/models/user_email.rb

402ec6bf5


      
          def normalize_email
            self.normalized_email =
              if self.email.present?
                username, domain = self.email.split("@", 2)
                username = username.gsub(".", "").gsub(/\+.*/, "")
                "#{username}@#{domain}"
              end
          end

pfaffman · February 17, 2025, 6:27pm

Single cpu speed is the important factor.

On my machines, a rate of 800-1000 users or posts/minute is fairly typical.

Note that when you do the final import, it’ll import only the users and posts that haven’t been imported already, so it won’t take very long.

Turn off the Normalize emails site setting (off was the default until recently). It should probably get turned off in this function here:

github.com/discourse/discourse

script/import_scripts/base.rb

main


      
          def change_site_settings
            if SiteSetting.bootstrap_mode_enabled
              SiteSetting.default_trust_level = TrustLevel[0] if SiteSetting.default_trust_level ==
                TrustLevel[1]
              SiteSetting.default_email_digest_frequency =
                10_080 if SiteSetting.default_email_digest_frequency == 1440
              SiteSetting.bootstrap_mode_enabled = false
            end
          
            @site_settings_during_import = get_site_settings_for_import
          
            @site_settings_during_import.each do |key, value|
              @old_site_settings[key] = SiteSetting.get(key)
              SiteSetting.set(key, value)
            end
          
            # Some changes that should not be rolled back after the script is done
            if SiteSetting.purge_unactivated_users_grace_period_days > 0
              SiteSetting.purge_unactivated_users_grace_period_days = 60
            end

This file has been truncated. show original

You can put it in your customized version of the xenforo script with SiteSetting.normalize_emails=false. I’m not sure what happened to those duplicate email users; there are two obvious things to do, give them a bogus email address or skip importing them. Looks like it gives them bogus ones? (And there’s a pretty good chance that they are, in fact, bogus users anyway). If it skipped them, then running the script again will import them.

SubStrider · February 17, 2025, 7:08pm

Yes on my laptop, it is churning things much faster at 1000 items per minute. Thats about 2 times faster than the on server. Still thats about 3 days.

I went through the skipped emails and it seems its doing a good job ignoring those accounts. I will just merge them prior to the final import. Hardly 20 odd such cases.

Note that when you do the final import, it’ll import only the users and posts that haven’t been imported already, so it won’t take very long.

Thank you for pointing this out. I observed this myself and it seems this is what is going to save the day when I do the final import. So I take a backup and restore on D-3 and then another backup and restore with the new DB backup file on Day 0. Is that correct?

pfaffman · February 17, 2025, 7:14pm

Are those backups and restores on the Xenforo site, or do you have some live Discourse site that you’re going to import the Xenforo data to?

As long as you don’t make changes to the script that require re-importing data, and what you have on your laptop now is what you want on your Discourse server, then you can just keep getting new dumps of the Xenforo database and importing them (to test, see how long it takes, and so on) and then on the cut-over day, you freeze the Xenforo site, get that database, run the script once more and upload to your Discourse server.

If you already have data on your Discourse site that you want to keep, things are much more complicated since you’ll need to freeze that site, then get the Xenforo data and then proceed as described above.

SubStrider · February 17, 2025, 7:43pm

It’ll be a fresh install of Discourse so that makes it straightforward.

I have a decent amount of time at hand as I want to test migrations multiple times, familiarize myself with Discourse thoroughly, get all add-ons configured the way I want and maybe also get my hands dirty with some add-on customization myself.

What you’ve explained lifts one pain point off my chest completely as I thought I would have to figure out bulk imports too.

SubStrider · February 21, 2025, 12:20am

Have come back with a query, does the import script output any logs? My test import is stuck at 98.2% for a few hours.

Another thing I realized, if I restart the migration, it takes around 30 seconds to skip over a batch of 1000 posts. So effectively the speed is now 2000 items per minute. Not a significant improvement over the 1000 posts per minute for the first import, as even on the last import on the day of the cutover, it will take about a days time. 23 hours out of which will just be skipping already imported items.

pfaffman · February 21, 2025, 12:38am

Just what you see.

You should probably stop it and start it again.

Yes, it’ll skip all data that’s been imported already. And it does it much faster than 2000 posts/minute. I suspect you’ll see when you restart it now.

SubStrider · February 21, 2025, 2:52am

Thats what, I restarted and then made the above post. It is 2000 posts/minute. To be sure I tried it again.

SubStrider · February 21, 2025, 10:50am

So managed to get the avatars and attachments imported. Copied these folders.

/internal_data/attachments
/data/avatars

To answer my question, the avatars and attachments get finalized once imported. If a user changes their avatar after their ID is imported, it will not get imported/updated because that post or user will get skipped in the second run.

Now just need to figure out the conversations import (can skip too but good to have) and permanent redirects.

@Fajfi - Thank you for your contribution to the import script. Worked flawlessly for avatars and attachments. Its still running and have not reached the likes portion yet.

SubStrider · February 25, 2025, 7:18pm

Fixed the conversations import. Was able to import over half a million messages from XF2.3 into discourse. Have raised a PR in case someone is interested.

----EDIT----

Raise another PR with a fix for likes import. It is surprising that nobody migrated from XF2.1+ to discourse till now. Likes were renamed to reactions in 2019 when XF2.1 released.

Topic		Replies	Views
Questions About Migrating From Xenforo Installation	22	6090	November 21, 2016
Migrate a vBulletin 3 forum to Discourse via XenForo Sysadmins how-to	0	2178	January 28, 2021
Migrate a phpBB3 forum to Discourse Migrating to Discourse how-to	458	95670	March 13, 2025
Migrate a vBulletin 4 forum to Discourse Sysadmins how-to	164	29364	May 7, 2025
Importers for large forums Announcements	50	9296	December 1, 2023

Migrate a XenForo forum to Discourse

Related topics