Easier forum migration to Discourse

Whilst a series of scripts are provided for forum migration, and recommended as part of the FAQ, performing a migration still requires considerable expertise connecting to the remote database, and/or setting up a local development version of Discourse, or even installing mysql. I propose a simpler process for the end user:

  • User uploads their other-forum db backup via the Discourse interface (so no remote db connection)
  • User initiates import script, script auto-backups the Discourse, and then runs the import

If the import scripts would provide the option to just look on a local import directory, that would already be a great improvement for ease of use, even if the import scripts are not run via the Discourse UI.

3 Likes

It would be great if we have better interface for migration. But the challenge is that migration scripts are far from complete thus itā€™s not helpful for building an interface. While the execution of scripts, you have to write more codes for converting posts, import more fields and even read error messages when it aborts in the middle of process. Thatā€™s why itā€™s meant to be for developers now.

There are a lot of developers contributing to scripts. Discourse has many import scripts now. Iā€™d say itā€™s quite an effort even they are far from complete to be used with a good UI. Anyway, paying a developer to migrate sites and asking them to submit a PR definitely leads a way with the proposed feature. I would hope that time comes soon.

6 Likes

One idea would be to create a common file format to import from.

Current import scripts could be refactored to become focused on exporting data from different forums, and then converting that data to the common import format.

A validator for the file format could help authors of export scripts validate that they are transformed properly. The validator could also be used by the import feature to help catch errors early when attempting to import.

The importer feature could then focus on the features for the UI, choosing what to import (and what not to).

2 Likes

Thatā€™s just shifting the problem and adding extra complexity in adding the ā€˜common file formatā€™.
You still have to extract data, and the ā€˜common file formatā€™ has to be continuously updated in order to support all those special cases.

6 Likes

We kinda already have that in the base importer. But each script needs to do the hard work of handling the db format (and all dbā€™s from the same platform donā€™t always look the same), attachments (again, these always seem to be in a different format every time), and so on. Having a UI where you just enter a zip file and it just works is a dream weā€™ve talked about, butā€¦ itā€™s hard.

5 Likes

Yeah I agree with @michaeld this is not a good idea. Plus it already exists as some formats can be converted externally by third party converters.

1 Like

I would have to get hands on with the code for days or more to know whether I think itā€™s really a good idea or not, and I donā€™t have much reason to do that.

So, Iā€™m not pushing this idea. Just throwing it out thereā€¦

Unfortunately, thatā€™s never going to change. Iā€™ve been part of more forum migrations than I care to remember, and Iā€™ve never not needed the assistance of a programmer. Every forum migration is a beautiful snowflake. Different communities value different features, so you canā€™t just go ā€œweā€™ll strictly support feature X, Y, Zā€ and call it a day.

Believe me, if there was an easier way to do imports, weā€™d be doing it already. Easier imports means faster customer onboarding :money_mouth:

Thatā€™s not to say there isnā€™t room for improvement. First and foremost, Iā€™d love to see our migration guides get more love, especially the ones that donā€™t exist yet! Weā€™d be happy to pay some technical writers for quality migration tutorials. Iā€™ll put up a new topic for it.

7 Likes

Love it
:philosoraptor: :smile:

1 Like

What goes though my head is:

A Discourse Import feature actually in Discourse that runs like this:

What is the URL of your existing forum?

Checking websiteā€¦
I can see you are running phpBB 3.xxx

Just upload these two files to document root of your website:

  • discourse-import-bridge.php
  • discourse-import-bridge-key.php

When ready click Iā€™ve Uploaded the Files

Obviously for PHP based forum site - which most are? (anyway another bridge could easily be implemented in any other language)

This would then run you though a few dialogs

What categories would you like to import?

  • General
  • Classifieds

Next

After confirming whatever details needed

This will import XXXX topics, XXXXX posts and XXX users.

The import will run in the background and continue to synchronise from your existing forum into your new Discourse install.

Start Import

After selecting to start

Import 32% complete

Importing at 30 posts per minute.
Estimated time until completion 4 hours.

Server health check: Good
Server last communicated with 30 seconds ago.

After all the data has been imported:

Import 100% complete - continuing to synchronise new topics, posts and users.

Server health check: Good
Server last communicated with 2 minutes ago.


Where:

  • discourse-import-bridge-key.php is contains some unique key specific to the Discourse instance allowed to access the server for extracting data for import. Probably downloadable from the Discourse instance.

  • the discourse-import-bridge.php would provide an API:

  • decrypt ā€œDiscourse Import Bridge APIā€ HTTP API requests using the ā€œkeyā€.

  • executing a basic health check on the server config, PHP version, max request size etc, version of the ā€œDiscourse Import Bridge APIā€.

  • getting a list of files across the entire directory structure of the site

  • database proxy - receiving batched database queries, compressing responses and returning them in a chunked manor to avoid web server limitations.

  • downloading arbitrary files from file system, in compressed chunks.

  • the importer running on the Discourse server could use the ā€œDiscourse Import Bridge APIā€ provided to:

  • request the URL of the site to import from the user

  • make encrypted ā€œDiscourse Import Bridge APIā€ requests

  • run ā€œhealth checkā€ to ensure communication and server is ā€œgoodā€.

  • make a file system request for list of files

  • detect what type of forum installation (allowing the user to select if needed)

  • detect location of config file

  • read config file and identify DB location

  • read the DB as needed.

  • display prompts for any options specific to that importer (e.g. only import specific categories)

  • execute continuous background synchronisation of forum data so switch over is less of an issue.

  • notify and prompt user of synchronisation progress (sending emails if needed)

  • should use exponential back-off for all requests to discourse-import-bridge.php to avoid issues with hosting providers.

This Discourse importer would really just have an API layer for making file system requests, db requests, this layer would be switched between:

  • one version where ā€œI have the DB and file system mounted locallyā€
  • another version where ā€œIā€™m using the Discourse Import Bridgeā€.

The reason for the ā€œencryptionā€ of the requests is that most sites will not have HTTPS/SSL/TLS requests will contain information that should be kept secret, e.g. DB password.

It would / should be noted somewhere to the user that using the ā€œDiscourse Import Bridgeā€ could take much longer to complete.

10 Likes

Thatā€™s quite the ambitious project, but it sounds like it would work. Slower work speed due to the network, but a way faster and easier setup.

Agreed, I think the tooling could be improved but that would be a ginormous project by Discourse standards. Some of those single bullet points are very complex when youā€™d sit down to implement them.

9 Likes

Thatā€™s a big project thereā€¦ Iā€™d love if it became a reality, but I think itā€™s a big undertaking and still far from covering forums with custom functionality (which as mentioned above is a huge % and each one unique)

I believe the ā€œunique snowflakeā€ :snowflake: problem should be dealt with via ā€œfeature requestsā€ for each source (phpBB, vBulletin, bbPress etc.) being imported.

Basically it comes down to a missing ā€œfeatureā€ :candy: or a ā€œbugā€ :bug: .

The best way to surface these ā€œfeatureā€ or ā€œbugā€ reports / requests is to have an interface which:

  • sets expectations
  • allows the user to make informed decisions
  • informs the user if an ā€œknownā€ unsupported feature is currently in use:
  • if it wonā€™t be imported
  • where do go to get more info
  • preventing the user from continuing until they have ā€œagreedā€.

So for example one of the things the phpBB importer doesnā€™t do is set security permissions :guardsman: on categories.

This would be a critical warning, you wouldnā€™t want a site importing ā€œprivateā€ data :underage: and displaying it publically on the Discourse install.

In this case one answer would be to make the install private :eyes: until the group / category permissions could be manually setup by a user.

But then again, if this was a common element ā€“ then the phpBB importer could be improved to either:

  • add the functionality for creating these group / category permissions
  • OR: detecting if there are actually any categories using ā€œspecial permissionsā€ and if not donā€™t bother displaying the warning. Improving the process flow for that importer.

Again specifically thinking of phpBB how much of ā€œunique snowfakeā€ :snowflake: is complex understand, but there are small things that can be done:

  • check for actually listed plugins
  • warn the user the functionality provided by these wonā€™t be imported or is an unknown factor.

(But as a side note, consider a ā€œcomplete source code comparisonā€ for phpBB installs - so many ā€œmodā€ :no_pedestrians: installs - they will make you cry :cry: ).


Yes the above includes more of those bullet points with many man hours behind each.

Currently these decisions are made mostly by ā€œsomeone in the knowā€ because that person is having to:

  • run an import from a command line shell
  • make coding changes to get it to start
  • most likely install additional software on Linux (MySQL for phpBB)
  • and moreā€¦

Because the level of entry for this process is quite high, I believe the level of early abandonment will be too.


The vision I have outlined is just that a ā€œvisionā€ what could be.

It also specifically allows:

  • for users with zero command line skills to complete an import.
  • for users to be kept informed of progress.
  • expectations to be clearly set.
  • clear messaging on where to perhaps request a missing importer feature for a specific platform.
  • clear separation of old and new
  • no need to setup a new MySQL install
  • open up ports
  • upload a backup data via some SSH method users probably havenā€™t heard of.
  • Discourse server requirements to be kept minimal during import (i.e. the same as the final server requirements)
  • installing an additional service (MySQL) would push the server requirements up for the import locally.

Feel free to pick from this what you will to consider ā€œminimum requirementsā€ for improvements on the existing solution.

Sometimes itā€™s good to have a bigger picture in mind.

Just sharing, hope it helps.

8 Likes