Migrate a phpBB3 forum to Discourse

It’s actually 3.3, so I’m doing prep work pending the PRs coming to make the script compatible with that version of phpBB, or I may give it a shot with the current script using the edit you suggested earlier for the version check…

So I guess I’ll need to work with xml_to_markdown.rb?

2 Likes

My attempt to do a test import with a phpBB 3.3.x forum was a big fail, but more on that later.

I learned a lot in the process of trying it out, and thought I’d share what I learned because it might be helpful for other imports and when the import script is updated for phpBB 3.3.

If you need to edit /script/import_scripts/phpbb3/database/database.rb to get past the version check, or /script/import_scripts/phpbb3/support/bbcode/xml_to_markdown.rb to add custom BBcodes, the time to do so is after you enter the import with this command:

/var/discourse/launcher enter import

At that point, you can use nano to edit the files.

Since my attempt to run the import on my phpBB 3.3 forum failed, I can’t be sure my custom BBcode changes would have been effective, but it looks like all you have to do for phpBB 3.2+ is edit the file:

/script/import_scripts/phpbb3/support/bbcode/xml_to_markdown.rb

as noted above, and add a definition in the format of:

def visit_<your case-sensitive custom BBcode here>(xml_node, md_node)
  # code to handle things goes here
end

Again, I didn’t get to test this yet, but one very common custom BBcode in phpBB that doesn’t appear in the version of the xml_to_markdown.rb I had is this one:

[YouTube]{Identifier}[/YouTube]

where the {Identifier} is the “v” value from the YouTube URL’s query string.

So my Ruby-nooby attempt at adding that code to xml_to_markdown.rb is:

def visit_YouTube(xml_node, md_node)
      content = xml_node.content
      url = "https://www.youtube.com/watch?v=" + content
      md_node.text = "#{url}"
      md_node.prefix_linebreaks = md_node.postfix_linebreaks = 2
      md_node.prefix_linebreak_type = LINEBREAK_HTML
end

Finally, when I ran the import script, I only got to the database loading step, and got this error:

ERROR 1071 (42000) at line 1233035: Specified key was too long; max key length is 1000 bytes

I’m not sure if that’s a DB Engine issue–the original uses InnoDB–or what, but I suspect it will be addressed in the phpBB 3.3 version of the script.

2 Likes

Okay, so I couldn’t leave this alone.

While I suppose it’s possible that the error I got could be handled by the import script, it really appears to me that it’s an issue with the database structure of phpBB3, and the fact that my database collation is UTF8. I got this error running the import script:

ERROR 1071 (42000) at line 1233035: Specified key was too long; max key length is 1000 bytes

When I looked at line 1233035 of my phpbb_mysql.sql file–I’m using a local copy of the data dump–I saw this:

ALTER TABLE `phpbb_config`
  ADD PRIMARY KEY (`config_name`),
  ADD KEY `is_dynamic` (`is_dynamic`);

Looking at the config_name column in the table phpbb_config, I found that it’s set to VARCHAR(255).

Thanks to this post on Stackoverflow, I discovered that I can use a “prefix index” to shorten the length of the key. I tested it, and it works.

For my phpBB 3.3.8 database, there were actually four tables affected:

  • phpbb_config
  • phpbb_config_text
  • phpbb_migrations
  • phpbb_oauth_accounts

If anyone else wants to try it, here are the steps for the phpbb_config table, and I’ll include the queries for the other tables at the end.

Calculate the shortest value for the prefix index using this query:

SELECT
 ROUND(SUM(LENGTH(`config_name`)<10)*100/COUNT(`config_name`),2) AS pct_length_10,
 ROUND(SUM(LENGTH(`config_name`)<20)*100/COUNT(`config_name`),2) AS pct_length_20,
 ROUND(SUM(LENGTH(`config_name`)<50)*100/COUNT(`config_name`),2) AS pct_length_50,
 ROUND(SUM(LENGTH(`config_name`)<100)*100/COUNT(`config_name`),2) AS pct_length_100
FROM `phpbb_config`;

In my case, it showed that 100% of the rows used less than 50 characters, so I edited my data dump file:

nano /var/discourse/shared/standalone/import/data/phpbb_mysql.sql

and used search Ctrl+W for the following string:

ALTER TABLE `phpbb_config`

then changed this:

ALTER TABLE `phpbb_config`
  ADD PRIMARY KEY (`config_name`),
  ADD KEY `is_dynamic` (`is_dynamic`);

to this:

ALTER TABLE `phpbb_config`
  ADD PRIMARY KEY (`config_name`(50)),
  ADD KEY `is_dynamic` (`is_dynamic`);

The (50) in there is the prefix index length.

In my case, I had to repeat the process with the other three tables using these queries to get the optimal prefix key length:

SELECT
 ROUND(SUM(LENGTH(`config_name`)<10)*100/COUNT(`config_name`),2) AS pct_length_10,
 ROUND(SUM(LENGTH(`config_name`)<20)*100/COUNT(`config_name`),2) AS pct_length_20,
 ROUND(SUM(LENGTH(`config_name`)<50)*100/COUNT(`config_name`),2) AS pct_length_50,
 ROUND(SUM(LENGTH(`config_name`)<100)*100/COUNT(`config_name`),2) AS pct_length_100
FROM `phpbb_config_text`;

SELECT
 ROUND(SUM(LENGTH(`migration_name`)<10)*100/COUNT(`migration_name`),2) AS pct_length_10,
 ROUND(SUM(LENGTH(`migration_name`)<20)*100/COUNT(`migration_name`),2) AS pct_length_20,
 ROUND(SUM(LENGTH(`migration_name`)<50)*100/COUNT(`migration_name`),2) AS pct_length_50,
 ROUND(SUM(LENGTH(`migration_name`)<100)*100/COUNT(`migration_name`),2) AS pct_length_100
FROM `phpbb_migrations`;

SELECT
 ROUND(SUM(LENGTH(`provider`)<10)*100/COUNT(`provider`),2) AS pct_length_10,
 ROUND(SUM(LENGTH(`provider`)<20)*100/COUNT(`provider`),2) AS pct_length_20,
 ROUND(SUM(LENGTH(`provider`)<50)*100/COUNT(`provider`),2) AS pct_length_50,
 ROUND(SUM(LENGTH(`provider`)<100)*100/COUNT(`provider`),2) AS pct_length_100
FROM `phpbb_oauth_accounts`;

Here are the search strings for the other tables to save you some time:

ALTER TABLE `phpbb_config_text`
ALTER TABLE `phpbb_migrations`
ALTER TABLE `phpbb_oauth_accounts`

I won’t say I’m home free, but the above solved my initial database error, and the import is running:

#import_phpbb3.sh
The phpBB3 import is starting...

Loading existing groups...
Loading existing users...
Loading existing categories...
Loading existing posts...
Loading existing topics...

importing from phpBB 3.3.8

creating users
      722 / 5014 ( 14.4%)  [58 items/min]

Update:

The import script finished in 9 hours, 28 minutes. Thank goodness for Screen. This script is awesome!

Post-processing is now in progress.

Is there a log of the output of the import script somewhere, or should I have piped it to a file?

3 Likes

Which procedure for the phpbb3 importer should be used when running discourse_dev in a docker container under WSL?

I have tried doing both the procedure 1. Importing using a Docker container and procedure 2. Importing using development environment. Neither has worked although I did get the script to run to completion under procedure 1 but no data actually made it into the forum database despite the many queries seeming to have succeeded as they went along. Could there be a different database involved that is being populated due to import being run in its own container separate from the discourse_dev container?

In the case of procedure 2 it is simply not possible to execute the build which fails with “An error occurred while installing tiny_tds (2.1.5), and Bundler cannot continue.”

Should I try doing a ‘standard install’ (non-docker, non-dev) in WSL then do the Docker-based import?

1 Like

That’s what I recommend. I very rarely do a migration on development anymore.

Also, you don’t need tiny_tds, unless you’re using mssql rather than mysql/mariadb.

1 Like

Thanks very much, Jay. Your tutorials have been spot-on, I really appreciate having them.

2 Likes

I successfully installed Discourse on a new VPS using the standard Docker installation process. I then ran the import process according to these instructions and the script finished without problems. I exited the container and then issued the stop import and got the message that:

''You have less than 5GB of free space on the disk where /var/lib/docker is located. You will need more space to continue
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 24G 20G 2.7G 89% /

Would you like to attempt to recover space by cleaning docker images and containers in the system? (y/N)‘’

Recovering space produced no additional room. So how to proceed? Can I safely delete all the image files that I uploaded into /data? I am not sure if they are still needed.

Thanks.

1 Like

If the import seems to to have finished, and you’re talking about the images that should have been imported, it should be ok, but you’re likely not going to have enough space to make a backup. I would upgrade the droplet to have more space. The price per day shouldn’t be a big deal.

1 Like

Thank you for this advice, Jay.

In case these details will be helpful to others, I removed the image files (only) which freed up ~ 3.4 GB for a total of 6.1 GB available and Sidekiq then successfully chewed through the post-processing. The disk space has since been increased by an additional 20 GB.

For the record this was an import of a phpBB version 3.3.0 messageboard with a little over 73,000 posts in a little over 8,000 topics with a little over 1300 users.

The only issue I have found, so far, is that some usernames are squirrely but that has been discussed above and isn’t fatal.

There will be an interval between this import and the closure of the source phpBB forum. Can I just leave the import Docker container in place and then use it on an incremental backup to migrate posts made after this import?

1 Like

When I did my phpBB 3.3.x migration, to do my final import, I put the phpBB site on “maintenance mode,” used rsync to update the attachments, avatars, and smilies, then importi a new data dump, and had to rebuild the import to do the final run of the import script, but I had exited the import prior to that.

@DonH Did you import the phpBB passwords and get them to work with the Migrate Passwords plugin?

2 Likes

I did not import phpBB passwords for a couple of reasons. I wanted no potential conflicts with plugins and I wanted to force people to upgrade their passwords and this migration seemed like a good way (and good excuse) to do that. Security on the phpBB web host is handled by the hosting company, with the new VPS there is no such luxury.

To be clear, you re-imported your entire database one last time and did not do an incremental update?

1 Like

My understanding, and experience, is that if you do a data dump of your phpBB forum–from phpMyAdmin, for example–and upload the file to /var/discourse/shared/standalone/import/data, rebuild the import, then re-run the import command, the migration script won’t touch already imported user accounts, posts, etc., only database entries that were not imported previously.

In essence, it is doing an incremental update, but it doesn’t touch existing entries, so you may lose some data; say, if a user edited an already imported post or changed their email address.

1 Like

If you leave the plugin setting migratepassword_allow_insecure_passwords unchecked then the migratepassword plugin will not allow migrated passwords that are deemed unsecure by Discourse, honoring all the complexity settings like min length and unique characters.

Not sure what kind of plugin conflicts you are referring to?

2 Likes

@RGJ Will the migratepassword plugin work with phpBB 3.3?

I’m in the process of doing an import with a phpBB 3.3.8 forum as we speak, and I’m importing the passwords to give it a shot.

1 Like

How would you be able to migrate from myBB?

1 Like

You can search for myBB and the tag #migration maybe you will find a #howto

2 Likes

Ok, thanks for the info!

1 Like

Thanks for that clarification, Richard. I wasn’t referring to any specific conflict, just the potential for something unexpected to happen in unfamiliar territory. I did the import deliberately before adding any plugins. Mostly, though, I want to force our members to upgrade their passwords.

This migration went off without a hitch, really, so kudos to Gerhard and everyone else involved on a well-tempered script. I am looking forward to tailoring our new messageboard.

2 Likes

Yes it will, we recently added Argon2 support in the plugin.

4 Likes

Richard supports the plugin on his hosting. I’ve never heard of it causing problems. Many importers import the passwords and it Just works. I even had it work for an import script that I wrote for some other random forum.

1 Like