Migrate quickly to separate web and data containers

:warning: Warning: If you are not comfortable working as a Linux systems administrator, and do not have experience with docker containers, moving to a multi-container deployment will cause you difficulty, and both staff and volunteer help here will appropriately ask you to return to a standalone single-container deployment fully managed by the launcher script.

If you move to a multi-container deployment and your system breaks as a result, you are likely to experience the opportunity to keep both broken pieces. If you read the instructions below and it feels like magic, rather than clarifying how things actually work inside the containers, run, don’t walk to your nearest default standalone deployment and you’ll do yourself a favor.

The recommended method for migrating from a single-container deployment to a multi-container deployment is essentially:

  • Back up your discourse
  • Throw the whole thing away
  • Start over from scratch with a multi-container deployment
  • Restore your backup

If, like me, you have a large site that takes hours to restore, you might wonder whether there is a faster way. Wonder no more! I migrated from a standalone deployment to a three-container web, data, and redis deployment in less time than it typically takes to ./launcher rebuild app for that site. (12 minutes of total downtime, when rebuilding the app has sometimes taken more than 30 minutes.) Based on my experience, I would keep Redis with Postgres in a single data container in the future.

If you do this, you are taking responsibility for knowing when you need to rebuild your other containers (data, and if you are silly like me and split out redis, redis as well). You will no longer get free upgrades of everything with ./launcher rebuild app — if you don’t have the resources to manage this process, use a standalone deployment, or purchase hosted Discourse.

Test

Do not use this process to migrate to multiple containers unless, after reading it, you now understand also how it would also tell you how to migrate quickly from multiple containers to a single container. If that is not obvious to you after reading this, then this post is sufficiently advanced technology (that is, indistinguishable from magic), and it is possible that you also might not recognize if this process breaks partway through, so you might end up with a broken Discourse that you don’t recognize until much later. If that happens, you will get to keep both broken pieces. You break it, you buy it, as they say!

Backup

Back up first, and turn on thumbnail backups first so that you don’t have to rebuild them all on restore. If you make a mistake here, you will easily get into a situation where the easiest, safest, and fastest way to recover is to switch to the normal method. Be ready to fall back to the recommended method if anything goes wrong.

Download your backup. The commands below involve moving files around in the Discourse data, and if you make a mistake, maybe you will have deleted your backup. So download it. And, if your backup doesn’t include uploads, back them up too. They are also located where you’ll be moving files around.

Seriously, back up.

When I did this, first I did a backup, then I did a remote system backup, before I went any further.

Set up new multi-container configuration

You will need at least containers/web_only.yml and containers/data.yml and if you also want to split out redis, also containers/redis.yml. Start by copying samples/data.yml (and optionally samples/redis.yml) to the containers/ directory.

If you are deploying redis separately, remove the redis template from the top of the containers/data.yml file. (But don’t do that without a good reason; it’s just extra work.)

You have two ways to create web_only.yml.

  1. Copy samples/web_only.yml to containers/; then compare both of them to containers/app.yml, preserving any postgres configuration in params: in your new containers/data.yml
  • Copy any params: for postgres from containers/app.yml into containers/data.yml
  • Create a unique password to replace SOME_SECRET
  1. Alternatively, copy containers/app.yml to containers/web_only.yml and compare it to samples/web_only.yml,
  • Remove any references to the postgres and redis templates
  • Remove the entire params: section that had only postgres settings]
  • Add a links: section, verbatim from samples/web_only.yml or modified (see below) if you are deploying redis in a separate container
  • Add a database section from samples/web_only.yml and create a unique password to replace SOME_SECRET
  • Change the volume definitions from standalone to web_only

Here is the links: section to use if you are splitting redis out into its own container instead of using the reasonable default of bundling it with postgres in the data container:

# Use 'links' key to link containers together, aka use Docker --link flag.
links:
  - link:
      name: data
      alias: data
  - link:
      name: redis
      alias: redis

The redis link is not needed if you are combining the redis and postgress containers as a single data container; this is to show what you would do.

Here is a copy of the current postgres settings in env in samples/data.yml that you will need to change SOME_SECRET in:

  ## TODO: configure connectivity to the databases
  DISCOURSE_DB_SOCKET: ''
  #DISCOURSE_DB_USERNAME: discourse
  DISCOURSE_DB_PASSWORD: SOME_SECRET
  DISCOURSE_DB_HOST: data
  ## If you use a single data+redis container, the following will be "data"
  DISCOURSE_REDIS_HOST: redis

Note that for a normal deployment (not multisite) you will not need to modify any other lines. DISCOURSE_DB_SOCKET is for a Unix domain socket for Postgres, it’s not a port number.

Here is an example of the change to the volumes definition at the end of web_only.yml that you will need to use if you copy it from app.yml instead of from samples/web_only.yml:

@@ -75,10 +80,10 @@
 ## The Docker container is stateless; all data is stored in /shared
 volumes:
   - volume:
-      host: /var/discourse/shared/standalone
+      host: /var/discourse/shared/web_only
       guest: /shared
   - volume:
-      host: /var/discourse/shared/standalone/log/var-log
+      host: /var/discourse/shared/web_only/log/var-log
       guest: /var/log

Now set the same secret password you used in containers/web_only.yml in containers/data.yml instead of SOME_SECRET.

Now you are ready for the migration.

Now is when you take, and download, your final backup before you try the fast migration. Remember, if anything goes wrong here, you immediately go to the recommended method. I can’t stress this enough.

Separate data (postgres) and redis conainers:

cd /var/discourse

./launcher stop app
cd  shared
mkdir data
mkdir redis
mv standalone/postgres_* data/
mv standalone/redis_data/ redis/
mv standalone web_only
mkdir -p data/log/var-log
mkdir -p redis/log/var-log

cd ..

./launcher destroy app

./launcher bootstrap data
./launcher bootstrap redis
./launcher start redis
./launcher start data

./launcher bootstrap web_only
./launcher start web_only

Combined postgres+redis data container:

cd /var/discourse

./launcher stop app
cd  shared
mkdir data
mv standalone/postgres_* data/
mv standalone/redis_data/ data/
mv standalone web_only
mkdir -p data/log/var-log

cd ..

./launcher destroy app

./launcher bootstrap data
./launcher start data

./launcher bootstrap web_only
./launcher start web_only

Also note that if you have previously set up external nginx, you will need to change the proxy_pass path to match the new web_only socket location; say from http://unix:/var/discourse/shared/standalone/nginx.http.sock: to http://unix:/var/discourse/shared/web_only/nginx.http.sock:

For me, on a 2-core VM with 4GB RAM and a site with 600MB backups without downloads, this process resulted in 12 minutes of downtime. Your milage may vary.

Note that none of this so far updates the launcher. You may not be up to date. (For example, I ran this after the postgres 12 update was available, but before I had applied it. This process left me with postgres 10. Then the very next thing I did was rebuild the data app, which updated launcher and took me successfully through the postgres 12 update process.)

What to do on future updates

After this migration, if you need to update redis or data, you need to first stop the web app. This would look something like this:

./launcher stop web_only
./launcher rebuild data # and/or redis
./launcher rebuild web_only

Note that if you rebuild the data (or postgres, or redis) container, you will need to create a new web container to re-connect it with the new data container. You can do this either by rebuilding web_only or, if you don’t think you need to rebuild it, a ./launcher destroy web_only; ./launcher start web_only will do the trick (and if you get an error about “missing data container” or similar, this is what you need to do).

However, when neither postgres nor redis needs an update, it’s much faster not to have to rebuild those containers, and most app rebuilds are just ./launcher rebuild web_only.

Alternatively, for even less downtime (variously reported between 15 seconds and 2 minutes):

./launcher bootstrap web_only
./launcher destroy web_only && ./launcher start web_only

Again, by moving to a multi-container deployment, keeping track of when that is appropriate is now your job. You’ll get notifications in the Admin console about updates, but they will apply only to the web_only container. Nothing will tell you when you need to update postgres or redis. If you do this, read the Announcements category before every version upgrade you do, and read the release notes for every new version you are upgrading to or through. That is, if you skip a version update, do not skip reading the release notes for the version you skipped updating to. (Consider setting up a watch on release notes, or subscribing your feed reader to https://meta.discourse.org/tag/release-notes.rss to stay up to date.)

Note that rebuilding the web_only container requires that the database be running, so you cannot speed things up by rebuilding all two or three containers in parallel. If you are going to rebuild them all every time, stick with the standard recommended standalone deployment; it will be faster than juggling multiple containers.

Backup Review

If you back up uploads separately from the database, I hope you have a file-based remote backup regime for uploads so that it’s being backed up to restore in case of disaster.

Review your remote backup implementation to make sure that it will back up uploads in /var/discourse/shared/web_only instead of in /var/discourse/shared/standalone so that you keep your backups up to date in your new multi-container implementation.

21 Likes