Easier forum migration to Discourse


(Dave Higgins) #1

Whilst a series of scripts are provided for forum migration, and recommended as part of the FAQ, performing a migration still requires considerable expertise connecting to the remote database, and/or setting up a local development version of Discourse, or even installing mysql. I propose a simpler process for the end user:

  • User uploads their other-forum db backup via the Discourse interface (so no remote db connection)
  • User initiates import script, script auto-backups the Discourse, and then runs the import

If the import scripts would provide the option to just look on a local import directory, that would already be a great improvement for ease of use, even if the import scripts are not run via the Discourse UI.


Poll: Which forum importer would you most like to see a tutorial for?
(Erick Guan) #2

It would be great if we have better interface for migration. But the challenge is that migration scripts are far from complete thus it’s not helpful for building an interface. While the execution of scripts, you have to write more codes for converting posts, import more fields and even read error messages when it aborts in the middle of process. That’s why it’s meant to be for developers now.

There are a lot of developers contributing to scripts. Discourse has many import scripts now. I’d say it’s quite an effort even they are far from complete to be used with a good UI. Anyway, paying a developer to migrate sites and asking them to submit a PR definitely leads a way with the proposed feature. I would hope that time comes soon.


(Dave McClure) #3

One idea would be to create a common file format to import from.

Current import scripts could be refactored to become focused on exporting data from different forums, and then converting that data to the common import format.

A validator for the file format could help authors of export scripts validate that they are transformed properly. The validator could also be used by the import feature to help catch errors early when attempting to import.

The importer feature could then focus on the features for the UI, choosing what to import (and what not to).


(Michael - DiscourseHosting.com) #4

That’s just shifting the problem and adding extra complexity in adding the ‘common file format’.
You still have to extract data, and the ‘common file format’ has to be continuously updated in order to support all those special cases.


(Neil Lalonde) #5

We kinda already have that in the base importer. But each script needs to do the hard work of handling the db format (and all db’s from the same platform don’t always look the same), attachments (again, these always seem to be in a different format every time), and so on. Having a UI where you just enter a zip file and it just works is a dream we’ve talked about, but… it’s hard.


(Jeff Atwood) #6

Yeah I agree with @michaeld this is not a good idea. Plus it already exists as some formats can be converted externally by third party converters.


(Dave McClure) #7

I would have to get hands on with the code for days or more to know whether I think it’s really a good idea or not, and I don’t have much reason to do that.

So, I’m not pushing this idea. Just throwing it out there…


(Erlend Sogge Heggen) #8

Unfortunately, that’s never going to change. I’ve been part of more forum migrations than I care to remember, and I’ve never not needed the assistance of a programmer. Every forum migration is a beautiful snowflake. Different communities value different features, so you can’t just go “we’ll strictly support feature X, Y, Z” and call it a day.

Believe me, if there was an easier way to do imports, we’d be doing it already. Easier imports means faster customer onboarding :money_mouth:

That’s not to say there isn’t room for improvement. First and foremost, I’d love to see our migration guides get more love, especially the ones that don’t exist yet! We’d be happy to pay some technical writers for quality migration tutorials. I’ll put up a new topic for it.


Poll: Which forum importer would you most like to see a tutorial for?
(Dave Higgins) #9

Love it
:philosoraptor: :smile:


(Dean Taylor) #10

What goes though my head is:

A Discourse Import feature actually in Discourse that runs like this:

What is the URL of your existing forum?

Checking website…
I can see you are running phpBB 3.xxx

Just upload these two files to document root of your website:

  • discourse-import-bridge.php
  • discourse-import-bridge-key.php

When ready click I’ve Uploaded the Files

Obviously for PHP based forum site - which most are? (anyway another bridge could easily be implemented in any other language)

This would then run you though a few dialogs

What categories would you like to import?

  • General
  • Classifieds

Next

After confirming whatever details needed

This will import XXXX topics, XXXXX posts and XXX users.

The import will run in the background and continue to synchronise from your existing forum into your new Discourse install.

Start Import

After selecting to start

Import 32% complete

Importing at 30 posts per minute.
Estimated time until completion 4 hours.

Server health check: Good
Server last communicated with 30 seconds ago.

After all the data has been imported:

Import 100% complete - continuing to synchronise new topics, posts and users.

Server health check: Good
Server last communicated with 2 minutes ago.


Where:

  • discourse-import-bridge-key.php is contains some unique key specific to the Discourse instance allowed to access the server for extracting data for import. Probably downloadable from the Discourse instance.

  • the discourse-import-bridge.php would provide an API:

  • decrypt “Discourse Import Bridge API” HTTP API requests using the “key”.

  • executing a basic health check on the server config, PHP version, max request size etc, version of the “Discourse Import Bridge API”.

  • getting a list of files across the entire directory structure of the site

  • database proxy - receiving batched database queries, compressing responses and returning them in a chunked manor to avoid web server limitations.

  • downloading arbitrary files from file system, in compressed chunks.

  • the importer running on the Discourse server could use the “Discourse Import Bridge API” provided to:

  • request the URL of the site to import from the user

  • make encrypted “Discourse Import Bridge API” requests

  • run “health check” to ensure communication and server is “good”.

  • make a file system request for list of files

  • detect what type of forum installation (allowing the user to select if needed)

  • detect location of config file

  • read config file and identify DB location

  • read the DB as needed.

  • display prompts for any options specific to that importer (e.g. only import specific categories)

  • execute continuous background synchronisation of forum data so switch over is less of an issue.

  • notify and prompt user of synchronisation progress (sending emails if needed)

  • should use exponential back-off for all requests to discourse-import-bridge.php to avoid issues with hosting providers.

This Discourse importer would really just have an API layer for making file system requests, db requests, this layer would be switched between:

  • one version where “I have the DB and file system mounted locally”
  • another version where “I’m using the Discourse Import Bridge”.

The reason for the “encryption” of the requests is that most sites will not have HTTPS/SSL/TLS requests will contain information that should be kept secret, e.g. DB password.

It would / should be noted somewhere to the user that using the “Discourse Import Bridge” could take much longer to complete.


(Kane York) #11

That’s quite the ambitious project, but it sounds like it would work. Slower work speed due to the network, but a way faster and easier setup.


(Robin Ward) #12

Agreed, I think the tooling could be improved but that would be a ginormous project by Discourse standards. Some of those single bullet points are very complex when you’d sit down to implement them.


(moshe) #13

That’s a big project there… I’d love if it became a reality, but I think it’s a big undertaking and still far from covering forums with custom functionality (which as mentioned above is a huge % and each one unique)


(Dean Taylor) #14

I believe the “unique snowflake” :snowflake: problem should be dealt with via “feature requests” for each source (phpBB, vBulletin, bbPress etc.) being imported.

Basically it comes down to a missing “feature” :candy: or a “bug” :bug: .

The best way to surface these “feature” or “bug” reports / requests is to have an interface which:

  • sets expectations
  • allows the user to make informed decisions
  • informs the user if an “known” unsupported feature is currently in use:
  • if it won’t be imported
  • where do go to get more info
  • preventing the user from continuing until they have “agreed”.

So for example one of the things the phpBB importer doesn’t do is set security permissions :guardsman: on categories.

This would be a critical warning, you wouldn’t want a site importing “private” data :underage: and displaying it publically on the Discourse install.

In this case one answer would be to make the install private :eyes: until the group / category permissions could be manually setup by a user.

But then again, if this was a common element – then the phpBB importer could be improved to either:

  • add the functionality for creating these group / category permissions
  • OR: detecting if there are actually any categories using “special permissions” and if not don’t bother displaying the warning. Improving the process flow for that importer.

Again specifically thinking of phpBB how much of “unique snowfake” :snowflake: is complex understand, but there are small things that can be done:

  • check for actually listed plugins
  • warn the user the functionality provided by these won’t be imported or is an unknown factor.

(But as a side note, consider a “complete source code comparison” for phpBB installs - so many “mod” :no_pedestrians: installs - they will make you cry :cry: ).


Yes the above includes more of those bullet points with many man hours behind each.

Currently these decisions are made mostly by “someone in the know” because that person is having to:

  • run an import from a command line shell
  • make coding changes to get it to start
  • most likely install additional software on Linux (MySQL for phpBB)
  • and more…

Because the level of entry for this process is quite high, I believe the level of early abandonment will be too.


The vision I have outlined is just that a “vision” what could be.

It also specifically allows:

  • for users with zero command line skills to complete an import.
  • for users to be kept informed of progress.
  • expectations to be clearly set.
  • clear messaging on where to perhaps request a missing importer feature for a specific platform.
  • clear separation of old and new
  • no need to setup a new MySQL install
  • open up ports
  • upload a backup data via some SSH method users probably haven’t heard of.
  • Discourse server requirements to be kept minimal during import (i.e. the same as the final server requirements)
  • installing an additional service (MySQL) would push the server requirements up for the import locally.

Feel free to pick from this what you will to consider “minimum requirements” for improvements on the existing solution.

Sometimes it’s good to have a bigger picture in mind.

Just sharing, hope it helps.