Make discourse_merger be part of the admin backup/restore process?

So I spent several hrs yesterday working with discourse_merger on the most recent beta code v20180828065005 (docker dev, bare dev, modified production to docker dev, a million different environmental settings, etc) only to realize something is broken with it (or with me. Why not both?).

During this experience though I realized something: this ability could become a critical part of the official workflow if it’s simply added to the admin front end next to the “RESTORE” button.

38%20AM
(with a nifty merge icon, not the play icon)

Obviously this would be something to manually turn on in settings (just like restore) + there would need to be a screen that pops up that asks for original URL of the backup that you are merging into the current forum, etc.

But, what this would do would be to formalize the process for migrating existing historical data into working Discourse forums. I think a LOT of people have old mailman listserves, multiple VB4s, etc, that could all use to be ported to Discourse. The problem is that many (most) of the old and new forums share users that can only be identified by unique email and none (hardly any) of the forum importers know how to look for email first before UID matching. discourse_merger solves this but seems impossible to get working correctly. It’s still to beta IMO.

So doing a MERGE in the admin area would allow the following: A forum import (let’s say BBPRESS to DISCOURSE) would happen offline as per usual. This is then verified and tweaked (uploads, user data, etc) and then uploaded to the admin area of a backup production discourse, then merged, verified, and then done live. This would also allow for import services to happen elsewhere and a discourse backup created that is sent to the admin of the forum for them to restore to or merge at will . . .

tldr: can discourse_merged be stabilized and institutionalized as part of the main feature set of the discourse backup and restore process so historical data with duplicate users can be easily migrated from ANY previous forum to existing Discourse forums w/out too much developer experiance?

best,
Walker

We haven’t implemented a UI for any of our importers, although it’s been on our wish list for a while. This merger tool, like other importers, uses a lot of resources so running it on a production server is currently not recommended. There’s a lot of work involved…

4 Likes

As you’ve outlined, merging forums is difficult and there are many things that can go wrong.

If you have multiple sources of data that you want to merge, then you’ll need to do something like you’ve outlined (e.g., modify the import script to look for a matching email address and use that user instead of creating a new one). And if you want to do it with a community that’s already live, you’ll need to run the import script on the live server, which is perilous.

Having done a couple dozen imports, the thing that is most striking is how different they all are. You’d think that after you’d done an import from WhateverGreatForum a couple times then it’d be a simple matter to do it again, but that’s almost never the case.

2 Likes

Yeah. That is why I was thinking of the discourse_merger as a really nice interstitial step that sits between the first “WhateverGreatForum ->Discourse” and “MyGreatDiscourse Forum is Good to Go Live!”. But allow for it to pull from uploads to the backups folder.

discourse_merger already solves the email matching issue so if it’s part of the chain this doesn’t have to be solved or implemented in any of the existing importers.

I bet. I could see the Docker instance bogging pretty hard if merging 300k posts, etc . . .

As is I’m interested if anyone has specifically gotten discourse → discourse working on current beta using dev environment.

best,
-Walker