There is this open issue (which I just closed) on Discourse Docker repo asking
Discourse should behave like a standard docker image
a docker user excepts to be able to run discourse without a special launcher.
to define multiple containers (redis, postgres, discourse,…), links and volumes, docker-compose should be used instead.
Docker Compose overview | Docker Docs
The question is somewhat misguided, but I would like to address a broader one of
Can Discourse ship frequent Docker images that do not need to be bootstrapped?
This question if far more complicated, so let me start here
Compose and launcher operate at different levels
Compose is a tool used for multi image orchestration. It allows you to “define” environments, bring them up and tear them down. It allows simple generation of base images from Dockerfiles (provided they are not local)
Launcher is a tool used for a wide variety of processes many of which have no parity with Docker Compose
-
We have a series of pre-req tests that ensure a system is up to the task of running Discourse (enough memory / disk space and so on)
-
We provide lightweight wrappers on top of
docker exec
,docker start
,docker stop
anddocker rm
, these wrappers ensure containers are launched with the correct configuration, automatically restart and so on -
We allow rich semantics with generation of base images, this allows you to mixin support similar to the closed INCLUDE proposal for Docker.
-
Unlike Docker compose, Launcher only deal with configuration of a single container
-
We provide utilities like “cleanup” that allow you to clean up old orphan docker images
-
Our installation process was always meant to be “philosophy” agnostic. You can run Discourse as a single “fat” container including pg/redis and the web, you can also run Discourse in multiple containers using launcher keeping db and redis in other containers. Launcher reflects this.
At the end of the day … launcher and Docker Compose are compatible, you can generate the base images you want using launcher and bring up your environment using compose. The tools do very different things.
If you look at the images on your box you will notice an image called: local_discourse/app
(derived from your yml file name), this image can be pushed to your own registry, composed using Docker compose and so on.
So let me try to address some of the bigger question here instead.
Why do we need to bootstrap?
Discourse ships 2 Docker images that you can use to generate your environment:
-
SamSaffron/discourse_base , config: Our base image that include all our dependencies excluding Discourse code. This image includes the “right” version of Ruby, Node, runit, postgres, redis and the list goes on and on. It also includes the plumbing for running stuff that uses this image, taking care of providing infrastructure for reliable cron, log rotation, a reliable boot up and boot down process and so on. All our images are derived off this image
-
SamSaffron/discourse , config : a simple image generated from our base image that creates the Discourse user and performs a “bundle install”.
The process of “bootstrapping” takes a base image (SamSaffron/discourse by default) and performs a series of transformations on it that prepare the image for execution. We did not use a Dockerfile, for this process cause the Dockerfile did not support the rich transformations we needed and generated lots and lots of layers.
In the context of a full container for Discourse the bootstrapping process will
- Ensure postgres configuration is in place, configuration is dynamic and can be overridden by parent configuration. For example, this process will default
[db_shared_buffers][7]
in postgres config to “256MB”, however users may amend theapp.yml
file to increase this amount, which in turn will amend the physical configuration file in the bootstrapped image. The Postgres template also takes care of pesky things like postgres upgrade, which are 100% absent from the official postgres images.
Dealing with issues such as this is totally omitted from the default images:
docker run -it --rm -v /tmp/pgdata:/var/lib/postgresql/data -e POSTGRES_PASSWORD=mysecretpassword postgres:9.3
docker run -it --rm -v /tmp/pgdata:/var/lib/postgresql/data -e POSTGRES_PASSWORD=mysecretpassword postgres:9.4
LOG: skipping missing configuration file "/var/lib/postgresql/data/postgresql.auto.conf"
FATAL: database files are incompatible with server
DETAIL: The data directory was initialized by PostgreSQL version 9.3, which is not compatible with this version 9.4.4.
-
Ensure redis is configured correctly, including ensuring that redis is always launched before the web comes up in “fat container” setup.
-
Ensure
rake assets:precompile
andbundle install
is executed. Perform a series of transformations on our existing config (including setting up log rotation, setting up directories, permission, configuring NGINX and so on) -
Apply specific user transformations. Want rate limiting? no problem. Want to use CloudFlare? Sure. Want to apply your own transformation? You have a rich set of hooks during the bootstrap process that allow you to apply your changes at the exact right spot.
In a nutshell there are 3 strong reasons why bootstrapping exists in the Discourse context
-
Allow end user to override various settings that are written into files (such as transformations of NGINX config and PG config)
-
rake assets:precompile
needs access to a “specific” customer database and config -
To ensure plugins are able to amend application.js file and insert plugin code (which is minified)
-
To ensure custom colors which live in the DB are taken into account when generating the application CSS file
-
To run DB migrations
What if we shipped an updated bootstrapped image?
Internally when we deploy we use a “pre-bootstrapped” image as our base image and then simply run “assets:precompile” and “rake db:migrate” as our only bit of custom bootstrapping code. This allows us to mostly eliminate the need for an expensive bootstrap on every deploy. Images for our customers are shipped to a central private repo, we deploy the same image on multiple machines to load balance customers.
There is a great appeal to having a simple “dockery” distribution for Discourse eg:
docker run -d --link discourse_db --link discourse_redis --name discourse discourse:latest
This kind of distribution would allow us to easily apply updates without needing do a new bootstrap EG:
docker pull discourse:latest
docker rm -f discourse
docker run -d --link discourse_db --link discourse_redis --name discourse discourse:latest
Bang, you have an update with very minimal downtime and you did not need to bootstrap.
But …
-
Who / what is going to migrate the database?
-
Who / what is going to ensure assets are precompiled if you have custom plugins? How are you going to install plugins?
-
How is this going to work in a multi machine setup? If I precompile assets on machine A so they include a plugins changes to application.js how can machine B get that exact change?
-
How are we going to apply non ENV var settings
-
How are you going to apply transformations like the cloudflare config or rate limiting?
Only way to get this going very reliably would be
-
Introduce an intelligent boot process that through distributed locking can cleanly generate compiled assets and migrate. Convert all params we have now into ENV, on boot generate the conf files we need based on ENV.
-
Trade around with your own custom discourse images just as we have today
-
Heavily amend core to allow for a more intelligently composed application.js and a smarter way for doing application.css
This is not a zero sum game
We can slowly chip at some problems:
-
We can automatically build and publish Discourse base images for tests-passed, beta and stable, if done carefully this can help a lot, cutting down on precompilation and bundle install times drastically. We have to be careful about image count explosion though.
-
We can migrate some (or even all) settings from params into ENV and introduce a boot sequence that applies the settings. This will allow people to tweak a lot of stuff without needing a full bootstrap
-
We can investigate a system that allows us to install a plugin without a “full” precompile, and maybe even ship plugin docker images.
That said, this is a huge system and I would like to focus on our real and urgent problems first. Which in my mind are:
-
Writing upgrade logic for pg from 9.3 to 9.5 (which is just around the corner) I want us to move to 9.5 this year
-
Moving us to a multi image install by default, its super nice not to need to stop a site during bootstrap, only way of achieving this is with a multi image install. However, I do not want users having to muck around with multiple config files, so we need to think of a format that would enable this simply.
-
Supporting another config format that is not yaml (probably toml). Our #1 support issue we see is “I mucked up my yaml file”, guess what … compose also has this problem.