SMF2 to Discourse - The Ultimate Guide

import

(Marco) #1

SMF2 to Discourse - The Ultimate Guide

Starting from the idea of creating an up-to-date guide to migrate from SMF2 to Discourse discussed in this excellent thread by @Vincent and @cmwebdev, I have prepared an ambitious “Ultimate Guide” using some notes I took merged with details already written in this other topic, which was fundamental to even start working on the import concept.

We are more than happy to update this guide with any feedback and further experiences as they come in.

Preparation work on SMF2

In order to minimise the issues while porting your SMF2 content to Discourse, please perform those of the following steps that are relevant for your current SMF2 configuration

  • Provide a description for each Category
  • Check that there are no users having the same e-mail.
  • Check for corrupted attachments. In certain cases corrupted attachments are crashing the import process.
  • If you have deleted users, but their posts are still in the SMF2 DB, in Discourse they’ll be assigned to the user “System”. Consider the idea to re-create the deleted users (each having a unique and valid e-mai to avoid import problems). This step might need a bit of behind-the-curtains SMF2 DB tweak, but this is beyond the purpose of this Guide.
  • Consider writing a script to deal with BBcode and other oddities that the Discourse importer is not dealing with (more below, in the “Bonus track” section). We strongly recommend to install the BBCode official Plugin, which substantially extends the number of BBcode accepted by Discourse.
  • If you have your attachments split in multiple directories (a feature that SMF2 allows), be aware that the Discourse import script expects them to be in a single directory.

Install Discourse

We’re not going to provide a step-by-step guide to install Discourse, as you can choose to do it as a paid service, or using (as we did) this excellent guide: discourse/INSTALL-cloud.md at master · discourse/discourse · GitHub
From here onwards we assume that you have been successful with your Discourse installation, and that you have command-line access to the host system where Discourse is installed.
We also assume that you have familiarity with the linux command line and with a few basic linux commands and text editors (e.g. vi or nano).

Prepare the SMF2 data for export

So, at the moment all your SMF2 data is sitting in a MySQL DB on a server which is, possibly, different than the one running your Discourse installation. In principle it is possible to connect directly to the SMF2 MySQL DB server from the Discourse server, provided that you have port 3306 exposed to the open Internet and that you have the credentials to connect to it.

[Optional] For our own SMF2 migration we have decided to do an extra step. We have created a duplicate of the SMF2 database on the server hosting the SMF2 forum, then run a pre-export script onto this clone, then export the cloned DB rather then the original one. This is because we are still testing the Discourse migration and we want to be absolutely sure we have the most complete and transparent migration possible. For further details see the “Bonus track section”.

In this guide we have chosen to go for a different solution, creating a minimalistic MySQL container on the Discourse server where we’re going to import the SLQ dump of our SMF2 database.
On your SMF2 DB server perfom a SMF2 DB dump.

/usr/bin/mysqldump -u<your_user> -p<your_password> --databases <your_smf2_db> > smf2.db

Prepare the SMF2 data for being imported into Discourse

There are a few steps that shall be performed for a successful first import. Execute the following on your Discourse host server.

  1. Make yourself at home and prepare the necessary directories
cd ~
mkdir smf2
mkdir smf2/attachments
  1. Copy over the SMF2 database, the attachments and the SMF2 Settings.php file.
rsync -aruviP user@smf2server:/path_to_db/smf2.sql ~/smf2/
rsync -aruviP user@smfd2server:/path_to_smf2_root/attachments/* ~/smf2/attachments/
rsync -aruviP user@smf2server:/path_to_smf2_root/Settings.php ~/smf2/
# If you have more than one attachments dir, this is a good time to just copy everything into one on the Discourse server
# rsync -aruviP user@smf2server:/path_to_smf2_root/attachments_A/ ~/smf2/attachments/
# rsync -aruviP user@smf2server:/path_to_smf2_root/attachments_B/ ~/smf2/attachments/
# rsync -aruviP user@smf2server:/path_to_smf2_root/attachments_C/ ~/smf2/attachments/
# ...
  1. Create and start the MySQL container.
docker run -d -e MYSQL_ROOT_PASSWORD=pass -e MYSQL_USER=user -e MYSQL_PASSWORD=pass -e MYSQL_DATABASE=db -v ~/smf2:/backup --name=mysql mysql

The container has been created to mount our directory ~/smf2 as volume, in the /backup directory inside the container. In other words all files and directories that you may have in the host directory ~/smf2 will be visible and available inside the mysql container under /backup.

  1. In case you need to (re)start the mysql container later, for any reason, you can just use the basic docker commands to do it
# Starting the mysql container
docker start mysql

# Stopping the mysql container
docker stop mysql
  1. Create a SQL script to configure the MySQL server running in the mysql container. This is necessary in case you have a large SMF2 database. Our own SMF2 forum has 320000+ posts and about 8 GB of attachments to import, and the first import tries were just frustrating. Crash, after crash, after crash. The crashes were due to the connection between the Discourse importer and the MySQL running in the container timing out. After reading several contributions, all leading us towards the right direction, we put together a SQL script containing every setting we needed for an ultra-stable importer-to-mysql connection.
    After this script was executed, we were able to run a full import without any timeout errors (btw, in our case it lasted for about 48 hours). Here is SQL the script (we saved it in ~/smf2 as “script_for_mysql_tuning.sql”)
ALTER USER 'user'@'%' IDENTIFIED WITH mysql_native_password BY 'pass';
SET GLOBAL net_write_timeout=3600;
SET GLOBAL net_read_timeout=3600;
SET GLOBAL delayed_insert_timeout=3600;
SET GLOBAL max_length_for_sort_data=8388608;
SET GLOBAL max_sort_length=8388608;
SET GLOBAL net_buffer_length=1048576;
SET GLOBAL max_connections=10000;
SET GLOBAL connect_timeout=31536000;
SET GLOBAL wait_timeout=31536000;
SET GLOBAL max_allowed_packet=1073741824;
SET GLOBAL mysqlx_read_timeout=2147483;
SET GLOBAL mysqlx_idle_worker_thread_timeout=3600;
SET GLOBAL mysqlx_connect_timeout=1000000000;

SET SESSION net_write_timeout=3600;
SET SESSION net_read_timeout=3600;
SET SESSION max_length_for_sort_data=8388608;
SET SESSION max_sort_length=8388608;
SET SESSION wait_timeout=31536000;

It’s important to remember that this .sql script will be “automatically” visible in the mysql container under /backup

  1. It’s now time to enter the mysql container
docker exec -it mysql bash
  1. Let’s import the smf2.db content. We configure the MySQL server first, and then we import the data. Environment variable $MYSQL_PASSWORD $MYSQL_DATABASE are pre-defined in the container. Keep in mind this step can be quite time consuming, depending on how much data you have in your smf2.db file.
mysql -uroot -p$MYSQL_PASSWORD $MYSQL_DATABASE < /backup/script_for_mysql_tuning.sql
mysql -uroot -p$MYSQL_PASSWORD $MYSQL_DATABASE < /backup/smf2.sql
  1. Exit the mysql container with
CTRL+D
  1. We need to get the IP address of the mysql container, that will be used later, when importing into Discourse. Note down the IP address.
docker inspect mysql | grep IPAddress

Preparing the the Discourse Host and container for importing

  1. First of all, we need to create a copy of the original app.yml file, for example import.yml. We’re going to edit the content of import.yml to enable the mysql2 gem and mount as volume the directory containing our smf2 attachments.
cd /var/discourse
cp containers/app.yml containers/import.yml
nano containers/import.yml
  1. Now inside containers/import.yml add - “templates/import/mysql-dep.template.yml” to the list of templates. Afterwards it should look something like this:
templates:
  - "templates/postgres.template.yml"
  - "templates/redis.template.yml"
  - "templates/web.template.yml"
  - "templates/web.ratelimited.template.yml"
  - "templates/web.ssl.template.yml"
  - "templates/web.letsencrypt.ssl.template.yml"
 # Un-comment the line below to enable the MySQL library in the Discourse container
  - "templates/import/mysql-dep.template.yml"

  ...

## The Docker container is stateless; all data is stored in /shared
volumes:
  - volume:
      host: /var/discourse/shared/standalone
      guest: /shared
  - volume:
      host: /var/discourse/shared/standalone/log/var-log
      guest: /var/log
  - volume:
      host: /root/smf2 # Here is where we wave copied, on the host, all attachments
      guest: /shared/smf2 # Here is the mounting point of the volume in the Discourse import container
  1. Stop the app container and rebuild the import container. Wait patiently.
/var/discourse/launcher stop app
/var/discourse/launcher rebuild import
  1. When the import container is ready, it’s time for a few tweaks on the smf2.rb importer script. This script is provided with the Discourse installation, and it needs a couple of changes to avoid the MySQL timeout problem and to fix the importing of the [quote] and [/quote] bbcode, which on Discourse must be at newline. For doing this we need to copy out (for convenience, in case of further import rebuilds, otherwise not necessary) the smf2.rb file and edit it.
docker cp import:/var/www/discourse/script/import_scripts/smf2.rb ~/smf2/
vi ~/smf2/smf2.rb
# Find
def create_db_connection
  Mysql2::Client.new(host: options.host, username: options.username,
                     password: options.password, database: options.database)
# Replace with
def create_db_connection
  Mysql2::Client.new(host: options.host, username: options.username,
                     password: options.password, database: options.database,
                     read_timeout: 3600, write_timeout: 3600, connect_timeout: 3600,
                     reconnect: true)

# Find
quote = "[quote=\"#{params['author']}"
# Replace with
quote = "\n[quote=\"#{params['author']}"

# Find
quote << "\"]#{convert_quotes(inner)}[/quote]"
# Replace with
quote << "\"]\n#{convert_quotes(inner)}\n[/quote]\n"
  1. Copy back the smf2.rb file into the import container
docker cp ~/smf2/smf2.rb import:/var/www/discourse/script/import_scripts/
  1. Edit the Settings.php file from SMF2 with the correct connection details. To be completely clear, this step is not really necessary, as the DB connection parameters can also be passed to the smf2.rb import script as parameters, but I found this solution to be faster and more flexible.
########## Database Info ##########
$db_type = 'mysql';
$db_server = '172.17.0.X'; # This is the IP address of the mysql container - use yours!
$db_name = 'db';
$db_user = 'user';
$db_passwd = 'pass';
$ssi_db_user = '';
$ssi_db_passwd = '';
$db_prefix = 'smf_';
$db_persist = 1;
$db_error_send = 0;

Import!

  1. Enter the import containers
/var/discourse/launcher enter import
  1. Start the import!
su discourse -c "bundle exec ruby script/import_scripts/smf2.rb /shared/smf2 -t Europe/Rome"
  1. Grab a beer and wait…

Further import sessions

Following the first, massive import, we are doing “delta” imports every night, untill we will be ready to move to Discourse 100%. To do this we are basically re-running the rsync to copy over whatever new attachment was created each day, then repeating the MySQL dump and import in the mysql container, and to finish we re-launch the import script.

Everything that has already been imported will just be ignored, so importer runs quite faster. If you really do want to minimise the import time you could alter the smf2.rb code adding a WHERE id_msg clause to the Query that prepares the data for importing posts.
For our deltas we have changed it like this:

create_posts(query(<<-SQL), total: total) do |message|
  SELECT m.id_msg, m.id_topic, m.id_member, m.poster_time, m.body,
         m.subject, t.id_board, t.id_first_msg, COUNT(a.id_attach) AS attachment_count
  FROM {prefix}messages AS m
  LEFT JOIN {prefix}topics AS t ON t.id_topic = m.id_topic
  LEFT JOIN {prefix}attachments AS a ON a.id_msg = m.id_msg AND a.attachment_type = 0
  WHERE m.id_msg > 304000
  GROUP BY m.id_msg
  ORDER BY m.id_topic ASC, m.id_msg ASC
SQL

Bonus track

As mentioned above, we have developed a PHP script which helps us sanitising some of the unsupported BBCcode and does also some extra stuff with our embedded images/links and unsupported emoji.
I paste here the main function of this script, in case all of part of it can be useful for your migration efforts.

Please note that this script is supposed to run on the SMF2 server, and that it does do potentially harmful changes to your SMF2 DB!. Some of the preg_replace have been commented out as we have installed the BBCode Plugin which supports them.

function cleanup()
{
	global $exportDbConnection; // This is a PDO connection object, which connects to the SMF2 DB.

	// Unsupported or custom emoji translation
	$emoF = array(
		0 => '/:tease:/',
		1 => '/\[emoji1\]/',
		2 => '/:agree:/',
		3 => '/:happy:/',
		4 => '/\[emoji28\]/',
		5 => '/:surprise:/',
		6 => '/:embarrassed:/',
		7 => '/:evil:/',
		8 => '/:sad:/',
		9 => '/:undecided:/',
		10 => '/:death:/',
		11 => '/:help:/',
		12 => '/:hurt:/',
		13 => '/:sick:/',
		14 => '/:spam:/',
		15 => '/:surprise:/',
		16 => '/:vomit:/',
		17 => '/:wounded:/',
		18 => '/:yes:/',
		19 => '/:badmood:/',
		21 => '/:stica:/',
		22 => '/:spank:/',
		23 => '/:shock:/',
		24 => '/:censored:/',
		25 => '/:rtfm:/',
		26 => '/:police:/',
		27 => '/:blindfold:/',
		28 => '/:canadian:/',
		29 => '/:clown:/',
		30 => '/:crazy:/',
		31 => '/:educated:/',
		32 => '/:gum:/',
		33 => '/:hungry:/',
		34 => '/:snore:/',
		35 => '/:suspious:/',
		36 => '/:tired:/',
		37 => '/:ugly:/',
		38 => '/:whatever:/',
		39 => '/:whistle:/',
		40 => '/:ninja:/',
		41 => '/:pirate:/',
		42 => '/:\[emoji16\]:/'
	);

	$emoT = array(
		0 => ':tongue:',
		1 => ':smiley:',
		2 => ':ok_hand:',
		3 => ':smile:',
		4 => ':sweat_smile:',
		5 => ':astonished:',
		6 => ':flushed:',
		7 => ':japanese_ogre:',
		8 => ':disappointed:',
		9 => ':thinking:',
		10 => ':skull:',
		11 => ':ambulance:',
		12 => ':face_with_head_bandage:',
		13 => ':face_with_thermometer:',
		14 => ':wastebasket:',
		15 => ':astonished:',
		16 => ':face_vomiting:',
		17 => ':face_with_head_bandage:',
		18 => ':ok_hand:',
		19 => ':angry:',
		21 => ':rocket:',
		22 => ':facepunch:',
		23 => ':dizzy_face:',
		24 => ':face_with_symbols_over_mouth:',
		25 => ':bookmark_tabs:',
		26 => ':policeman:',
		27 => ':see_no_evil:',
		28 => ':man_dancing:',
		29 => ':clown_face:',
		30 => ':crazy_face:',
		31 => ':notebook:',
		32 => ':smiley:',
		33 => ':spaghetti:',
		34 => ':confuse:',
		35 => ':thinking:',
		36 => ':weary:',
		37 => ':thinking:',
		38 => ':expressionless:',
		39 => ':kissing_smiling_eyes:',
		40 => ':martial_arts_uniform:',
		41 => ':skull_and_crossbones:',
		42 => ':grinning:'
	);

	$sta = 0;
	$step = 30000;

	do
	{
		$end = $sta + $step;

		$query = "SELECT id_msg, subject, body FROM smf_messages WHERE id_member != 19754 AND id_msg >= " . $sta . " AND id_msg <= " . $end . ";";
		echo $query.PHP_EOL;

		$sta += $step;

		try {
			$stmt = $exportDbConnection->query($query);
		} catch(PDOException $ex) {
			echo "An Error occured!";
			echo $ex->getMessage();
		}

		$results = $stmt->fetchAll(PDO::FETCH_ASSOC);
		foreach ($results as $k => $line) {

			$bbcode = $line['body'] ;

      // echo $line["id_msg"]." - ".$line["subject"].PHP_EOL;
			// echo $bbcode.PHP_EOL;
			// echo PHP_EOL." => ".PHP_EOL;

			// HTML line breaks to \n
			$bbcode = preg_replace('/(<br\s?\/?>)/is', "\n", $bbcode);

			//$bbcode = html_entity_decode ($bbcode,ENT_COMPAT | ENT_HTML401,"UTF-8");
			$bbcode = preg_replace('/\[hr\]/i', "\n---\n", $bbcode);

			/*
			$bbcode = preg_replace('/\[b\]/i', " **", $bbcode);
			$bbcode = preg_replace('/\[\/b\]/i', "** ", $bbcode);
			$bbcode = preg_replace('/\[u\]/i', "", $bbcode);
			$bbcode = preg_replace('/\[\/u\]/i', "", $bbcode);
			$bbcode = preg_replace('/\[i\]/i', " *", $bbcode);
			$bbcode = preg_replace('/\[\/i\]/i', "* ", $bbcode);
			$bbcode = preg_replace('/\[(ul|list|list type=decimal)\]/is', "", $bbcode);
			$bbcode = preg_replace('/\[\/(ul|list|li)\]/is', "", $bbcode);
			$bbcode = preg_replace('/\[li\]/is', " * ", $bbcode);
			$bbcode = preg_replace('/(\[(ol|ul|list|list type=decimal)\])\[/is', "$1\n", $bbcode);
			*/

			// We get rid of the [img] bbcode and we just keep the image url
			$bbcode = preg_replace('/(\[img]|\[img width(=|\d|")+\])(.+?)\[\/img]/i', "\n$3\n", $bbcode);

			// Fix double URLs like [url=http://www.website.it/xyz]http://www.website.it/xyz[/url]
			$regexp = '/\[url=(https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*))\]https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&\/\/=]*)\[\/url\]/i';
			$bbcode = preg_replace($regexp,'$1',$bbcode);

			// Images in [center] do not work in Discourse, removing the BBCcode
			$regexp = '/\[center\](\r\n|\r|\n)?(\S+\.(png|jpe?g|gif))(\r\n|\r|\n)?\[\/center\]/i';
			$bbcode = preg_replace($regexp, "$2", $bbcode);

			// LaTeX
			$bbcode = preg_replace('/\[tex\](.+?)\[\/tex\]/i', ' $ $1 $', $bbcode);

			// Each bbcode goes to newline,
			$bbcode = preg_replace('/\]\[/is', "]\n[", $bbcode);

			/*
			$bbcode = preg_replace('/\[center\](.+?)\[\/center\]/i', '### $1', $bbcode);
			$bbcode = preg_replace('/\[color.+?\](.+?)\[\/color\]/i', '$1', $bbcode);
			$bbcode = preg_replace('/\[size=?\d+pt\](.+?)\[\/size\]/i', '$1', $bbcode);
			$bbcode = preg_replace('/\[font.+?\](.+?)\[\/font\]/i', '$1', $bbcode);
			*/

			$bbcode = preg_replace('/\[sub\](.+?)\[\/sub\]/i', '$1', $bbcode);

			// Replace multiple (3 ore more) line breaks with a single one.
			$bbcode = preg_replace('/[\r\n]{3,}/s', "\n\n", $bbcode);
			$bbcode = preg_replace('/[\n]{3,}/s', "\n\n", $bbcode);

			// Handle some special case here...
			$bbcode = preg_replace('/(&amp;#039;|&#039;)+/',"'", $bbcode);
			$bbcode = preg_replace('/&nbsp;/'," ", $bbcode);

			$bbcode = preg_replace($emoF,$emoT,$bbcode);

			$subject = html_entity_decode ($subject,ENT_COMPAT | ENT_HTML401,"UTF-8");

			$upd = $exportDbConnection->prepare("UPDATE smf_messages SET subject=?, body=? WHERE id_msg=?");
			$upd->execute(array($subject,$bbcode, $line['id_msg']));
			$affected_rows = $upd->rowCount();
			echo $affected_rows.PHP_EOL;
			echo '<hr>'.PHP_EOL;
		}

		ob_flush();
		sleep (5);
	} while ($end <= 360000);
}

How to migrate from SMF2 to Discourse
Redirecting old forum URLs to new Discourse URLs
#2

Great guide! It really helped me migrate a large SMF2 forum to Discourse and I’m loving it.

Just two notes:

  • The host’s smf2 path might be different. In my case it was /home/ubuntu/smf2 (EC2 instance running Ubuntu 16.04 LTS image).
  • After the import is finished, the user needs to destroy the import container because if not, that’s the one that will run on server boot (instead of the app container).

Tips:

  • Run the import using a large EC2 instance (c5.2xlarge) and then downgrade to a smaller type (t2.medium) - for me it took about 1.5 hours to import a forum with 28k messages and 10GB of attachments!
  • If you’re planning on using S3 for uploads - then set it up before running the import!