How to merge two Discourse sites

import

(Neil Lalonde) #1

If you have two Discourse sites that you wish were one, this guide is for you.

There’s a tool called discourse_merger that can take one Discourse site and merge it into another.

Prereqs

This is not an easy task, and should be treated like any other migration to Discourse. You will not be running discourse_merger on a live production site. You will perform the merge in another environment where you can review the output before moving the result to production.

Copy vs Merge

Almost everything will be copied from one site to the other, but Categories and Users can be merged, which will avoid duplication.

  • Users will be merged if a user on both sites has the same email address.
  • Categories will be merged if they have the same name.

If you want to do any reorganization of you data, do it before merging.

Choose the destination site

Choose which site will be the destination for the data. This is the one that will retain all its styling and settings. The other site will have it’s users, categories, topics, posts, uploads, etc. copied/merged into the destination site.

How to do it

Take backups of both sites including files and copy them to the environment where you’ll perform the merge. It’s possible that they’re from different versions of Discourse, so we need them to be at the same version. I would choose to use the most recent version of Discourse while performing the merge.

Restore the destination site.

bundle exec ruby script/discourse restore destination-2018-08-02-134227-v2018xxx.tar.gz

Next we’ll extract the other site.

cd /path/to/data
tar xvzf other-2018-08-02-134227-v2018xxx.tar.gz

The output will include the database dump and the upload files.

Create a database with the data:

psql
CREATE DATABASE "copyme" ENCODING = 'utf8';
\q
gunzip < /path/to/data/other-2018-08-02-134227-v2018xxx.tar.gz | psql -d copyme

Now it’s time to run the script. Some env variables you’ll set:

DB_NAME: name of database being merged into the destination site.
DB_HOST: (optional) hostname of database being merged. leave blank if it’s local.
UPLOADS_PATH: absolute path of the directory containing “original” and “optimized” dirs. e.g. /path/to/data/uploads/default
SOURCE_BASE_URL: base url of the site being merged. e.g. https://meta.discourse.org
SOURCE_CDN: (optional) base url of the CDN of the site being merged.

IMPORT=1 DB_NAME=copyme SOURCE_BASE_URL=http://copy.othersite.com UPLOADS_PATH=/shared/import/data/uploads/default bundle exec ruby script/bulk_import/discourse_merger.rb

When it’s done, review the output in a web browser.

You can use the remap tool to update links from the old forum.

bundle exec ruby script/discourse remap 'copy.othersite.com' 'hot.newsite.com'

Also rebake all posts with uploads:

rake posts:rebake_match["upload:"]

If everything looks good, take a backup of the result and restore it to your production server.

bundle exec ruby script/discourse backup

Merging 2 discourse forums
Topic and Category Export/Import
Setting up backup and image uploads to DigitalOcean Spaces
(Neil Lalonde) split this topic #3

2 posts were split to a new topic: Merge script error when copying tags


(Benjamin Lupton) #4

How do I get this to work with the Discourse Docker setup?

So I was able to get this far:

  1. cd /root
  2. mkdir merger
  3. upload the backup to /root/merger using something like Transmit
  4. tar xvzf the-bazkup.tar.gz to extract the backup
  5. docker cp /root/merger app:/var/www/discourse/tmp/ where /root/merger is the directory containing your backup files
  6. cd /var/discourse
  7. ./launcher enter app
  8. cd /var/www/discourse
  9. su - postgres to login as the correct psql user
  10. psql to enter the psql repl
  11. CREATE DATABASE "copyme" ENCODING = 'utf8';
  12. \q to exit the psql repl
  13. gunzip < /var/www/discourse/tmp/merger/dump.sql.gz | psql -d copyme

But when it came to running the import command, I got the following:

You are trying to install in deployment mode after changing
your Gemfile. Run `bundle install` elsewhere and add the
updated Gemfile.lock to version control.

If this is a development machine, remove the /var/www/discourse/Gemfile freeze 
by running `bundle install --no-deployment`.

The list of sources changed
The dependencies in your gemfile changed

You have added to the Gemfile:
* source: https://github.com/nlalonde/ruby-bbcode-to-md (at master)
* mysql2
* redcarpet
* sqlite3 (~> 1.3.13)
* ruby-bbcode-to-md
* reverse_markdown
* tiny_tds

Running ./launcher rebuild app didn’t help.

Tried bundle install --no-deployment which operated fine, but when running the import command got:

The git source https://github.com/nlalonde/ruby-bbcode-to-md is not yet checked out. Please run `bundle install` before trying to start your application

Tried:

  1. exit to logout of postgres user
  2. apt-get install -y libsqlite3-dev libmysqlclient-dev wget build-essential libc6-dev
  3. install tiny-tds deps: cd /var/www/discourse/tmp && wget ftp://ftp.freetds.org/pub/freetds/stable/freetds-patched.tar.gz && tar -xzf freetds-patched.tar.gz && cd freetds-* && ./configure --prefix=/usr/local && make && make install
  4. su - discourse
  5. cd /var/www/discourse
  6. vi .bundle/config change frozen from true to false
  7. IMPORT=1 bundle install

And it fails with:

root@discuss-app:/var/www/discourse# su - postgres
postgres@discuss-app:~$ cd /var/www/discourse
postgres@discuss-app:/var/www/discourse$ IMPORT=1 DB_NAME=copyme SOURCE_BASE_URL=https://discuss.jordanbpeterson.community UPLOADS_PATH=/var/www/discourse/tmp/merger/uploads/default bundle exec ruby script/bulk_import/discourse_merger.rb
Loading application...
Traceback (most recent call last):
	20: from script/bulk_import/discourse_merger.rb:1:in `<main>'
	19: from script/bulk_import/discourse_merger.rb:1:in `require_relative'
	18: from /var/www/discourse/script/bulk_import/base.rb:20:in `<top (required)>'
	17: from /var/www/discourse/script/bulk_import/base.rb:20:in `require_relative'
	16: from /var/www/discourse/config/environment.rb:2:in `<top (required)>'
	15: from /var/www/discourse/config/environment.rb:2:in `require'
	14: from /var/www/discourse/config/application.rb:16:in `<top (required)>'
	13: from /var/www/discourse/config/application.rb:16:in `require'
	12: from /var/www/discourse/config/boot.rb:21:in `<top (required)>'
	11: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap.rb:24:in `setup'
	10: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache.rb:33:in `setup'
	 9: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache.rb:33:in `new'
	 8: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache/cache.rb:14:in `initialize'
	 7: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache/cache.rb:107:in `reinitialize'
	 6: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache/cache.rb:107:in `synchronize'
	 5: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache/cache.rb:113:in `block in reinitialize'
	 4: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache/cache.rb:130:in `push_paths_locked'
	 3: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache/store.rb:47:in `transaction'
	 2: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache/store.rb:55:in `commit_transaction'
	 1: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache/store.rb:77:in `dump_data'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/bootsnap-1.3.2/lib/bootsnap/load_path_cache/store.rb:77:in `binwrite': Permission denied @ rb_sysopen - tmp/cache/bootsnap-load-path-cache.16518.5484.tmp (Errno::EACCES)

So I’m not sure. Any help would be appreciated. This is as far as I can get.

Running as root with DB_USER=postgres provides the following:

discourse@discuss-app:/var/www/discourse$ IMPORT=1 DB_NAME=copyme DB_USER=postgres SOURCE_BASE_URL=https://discuss.jordanbpeterson.community UPLOADS_PATH=/var/www/discourse/tmp/merger/uploads/default bundle exec ruby script/bulk_import/discourse_merger.rb
Loading application...
WARNING: It looks like your discourse plugins have recently changed.
It is highly recommended to remove your `tmp` directory, otherwise
plugins might not work.

No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
Traceback (most recent call last):
	5: from script/bulk_import/discourse_merger.rb:808:in `<main>'
	4: from script/bulk_import/discourse_merger.rb:808:in `new'
	3: from script/bulk_import/discourse_merger.rb:18:in `initialize'
	2: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/lib/pg.rb:56:in `connect'
	1: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/lib/pg.rb:56:in `new'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/lib/pg.rb:56:in `initialize': fe_sendauth: no password supplied (PG::ConnectionBad)

Running instructions here Merging 2 discourse forums then doing this also failed:

discourse@discuss-app:/var/www/discourse$ IMPORT=1 DB_NAME=copyme DB_USER=postgres SOURCE_BASE_URL=https://discuss.jordanbpeterson.community UPLOADS_PATH=/var/www/discourse/tmp/merger/uploads/default bundle exec ruby script/bulk_import/discourse_merger.rb
Loading application...
WARNING: It looks like your discourse plugins have recently changed.
It is highly recommended to remove your `tmp` directory, otherwise
plugins might not work.

No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
No connection to db, unable to retrieve site settings! (normal when running db:create)
Traceback (most recent call last):
	5: from script/bulk_import/discourse_merger.rb:808:in `<main>'
	4: from script/bulk_import/discourse_merger.rb:808:in `new'
	3: from script/bulk_import/discourse_merger.rb:18:in `initialize'
	2: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/lib/pg.rb:56:in `connect'
	1: from /var/www/discourse/vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/lib/pg.rb:56:in `new'
/var/www/discourse/vendor/bundle/ruby/2.5.0/gems/pg-1.1.4/lib/pg.rb:56:in `initialize': FATAL:  database "discourse_development" does not exist (PG::ConnectionBad)

So I give up.


(Walker Blackwell) #5

It requires tinkering with the merger ruby file to get past those errors. But yeah, I had no luck either. In general the whole thing is not usable. Too many real world clashes, bugs, etc. it needs a real look and do-over before it can be called anything but early beta level IMO.


(Benjamin Lupton) #6

Okay thanks. Will look into what this development setup means, as I was trying this on a live docker deployment — don’t know why there is a distinction really, as we wouldn’t be trying this on an installation that downtime mattered on — if I still can’t get progress, will just do the category export then import option then manually reinvite missing users and manually migrate uploads.


(Walker Blackwell) #7

Good luck.

ps: It works in the docker env. You just have to call the correct (Import=1) each time and make sure dependancies are set (aka, freetds, sqlite, etc), and that the psql db is properly set, etc.

But where you’ll get issues later is with duplicate post/user warnings. There is almost no way around it. These are phantom. I’ve tried even with two bare “stock” discourse forums and it won’t work. It seems the only person who has made this work is the developer (from my perspective).

I think dev-ops needs to take a look at fix.

-Walker


(Benjamin Lupton) #8

The getting started guides for docker, as well as for local dev, didn’t work. So I give up on that front too.

Will try the category export/import option on the hosted servers, and if that fails, not sure what will do. Perhaps consider Spectrum.


(Jeff Atwood) #9

@sam a good audition project might be to test / improve this tool by running it on two sample instances.


(Neil Lalonde) #10

You should include the IMPORT=1 env variable there to include those gems.

IMPORT=1 bundle install --no-deployment

root@discuss-app:/var/www/discourse# su - postgres
postgres@discuss-app:~$ cd /var/www/discourse
postgres@discuss-app:/var/www/discourse$ IMPORT=1 DB_NAME=copyme SOURCE_BASE_URL=https://discuss.jordanbpeterson.community UPLOADS_PATH=/var/www/discourse/tmp/merger/uploads/default bundle exec ruby script/bulk_import/discourse_merger.rb

You get the permission error because you’re running as the postgres user. su to the discourse user instead.

If you’re doing this in the docker container, then you should add RAILS_ENV=production to all the commands to avoid the error about the “discourse_development” database.

As mentioned in the prereq, this job “should be treated like any other migration to Discourse” which requires the same kind of knowledge to use the import scripts.


(Jay Pfaffman) #11

That’s rather difficult for most people because it includes some stuff that is likely not to be installed. tiny_tds is the one that’s the most troublesome to get installed, and IIRC, the bundle install fails if you don’t have the tiny_tds stuff installed. As a result, I usually end up editing the Gemfile by hand.


(Benjamin Lupton) #12

Correct. As indicated by the full instruction set I outlined, which results in this. Which still requires a lot of hacking from the OP instructions, and still does not work.

IMPORT=1 RAILS_ENV=production DB_NAME=copyme DB_USER=postgres SOURCE_BASE_URL=https://discuss.jordanbpeterson.community UPLOADS_PATH=/var/www/discourse/tmp/merger/uploads/default bundle exec ruby script/bulk_import/discourse_merger.rb

Something is definitely necessary, as no one outside of discourse has reported these scripts as working properly.

I was able to do the category export and import, using RAILS_ENV=production however tags are not imported, and source users that did not exist on the destination instance are done as system:


(Benjamin Lupton) #13

I’d rather be a discourse user, where discourse facilitates my own product development, rather than spending 3 months becoming a ruby and discourse developer and putting my own product development on hold, such that I can figure out the correct reproducible instructions.


I can put $100USD towards a bounty for this migration task.

I can send the two real site backups privately (as well as needed credentials), and the bounty taker can record an unedited screencast of them booting a fresh ubuntu virtual machine, and performing the successful migration. Providing the video, the backup of the successful migration, and the step by step instructions used in the video.

Then I can take those deliverables to reproduce it all on my side, and if I am able to successfully perform the migration, then I can release the bounty, and we can make the generic materials available to everyone.


That said, I’m going to look into other options in the meantime, in case no one wishes to take on such a bounty.


(Jay Pfaffman) #14

Just for a frame of reference, my bid for that fixing the script and making sure that you know how to run it yourself would be something like $2000 with half up front.


(Benjamin Lupton) #15

Good to know. That’s a quarter of my annual wage the past 3 years. So guess we can rule that out.


(Jay Pfaffman) #16

Aha! So that makes the math of saving you three months work make a lot more sense, and makes your $100 offer seem quite generous. FWIW, I did spent close to 3 months writing one of the first importers that I wrote.

It might not be that bad a job. There are lots of fiddly bits just getting all of the stuff set up correctly.


(Sam Saffron) #17

If you feel like spending some time to improve the scripts and process we would be happy to sponsor you contact me via a PM.