Costruire l'immagine senza toccare il database

Hi everyone.

I have a pretty small Discourse instance running (for years actually, with pretty much zero issues): https://discuss.cubeisland.de/.
I’ve been using the standard lanucher-based deployment process on a dedicated VM (on my own hardware in a data center). The only thing I changed over the years was to migrate to an externally running shared postgresql database.

Recently I started migrating applications from dedicated VMs to a docker swarm as a preparation step to eventually migrate to a kubernetes cluster, mostly to save resources and make parts of the infrastructure more “elastic”.

Today was the day I looked into this small Discourse instance as one of the few remaining dedicated application VMs. “It’s already running on docker, how hard can it be to deploy that to a swarm” I thought. And from what I read it actually would be. I can just take the image from the currently running instance, push it to our internal registry and run it in the swarm and all would work just fine, which is great.

I looked at the launcher files, especially the templates and samples and figured it might probably be a good idea to have the redis separate in such a deployment and maybe I could setup a CI job to build new images when I add plugins or I want to update. So I checked out discourse_docker locally, copied my existing app.yml container definition to the clone and tried to run ./launcher bootstrap app to build an image that I could then push to my registry, without immediately deploying it.

To my surprise, the script tried to connect the “production” postgresql server trying to migrate the database, which it luckily did not have access to from my local workstation.

I looked around here and apparently that is how this works, which make me wonder:

  1. How would build a container for a new instance, where I don’t have a database yet? Would I need to setup the production database before I can build the image?
  2. I assume this is the only time db:migrate is run, so if I have several similar instances (e.g. prod and test), I would need upgrade one of the instances to build the new image and then cannot use the same image for the second instance, even though the image would be identical.
  3. How would I go about building images for instances, where the database server is not accessible from the system building the image (which shouldn’t be that uncommon).

After reading a couple of posts (obviously including this one), I’m perfectly aware of the reasons for the build process as it is right now and I see the value of it for the mentioned 99% of people causally deploying Discourse on their standard full-fat VM. And I’m very used to “all in one” container models and I’m not opposed to that. After all, Docker’s key value comes from the fact, that the software vendor can pre-bake highly optimized configurations and bundle them into a reproducible runtime environment, removing the need for a lot of very application specific knowledge on the ops side of things. So I’m fully on board with using your provided tooling, why would I expect someone else to build better containers than the software vendor himself? Why would I want to split apart the nginx and the ruby application, when there is 0 benefit to be gained, just to make the deployment more “pure” (what ever that means…)?

However it is odd to see a container that is mutating runtime state while being nowhere near running. I already run quite a few applications in containers and I containerized quite a few myself, some of which were never intended to run in containers.

The prime example that comes to my mind, of an application that deals with similar requirements/issues in a similar way to Discourse, is Gitlab. While they do now provide a fancy Helm chart for a fully decomposed “how it should be” kubernetes deployment, I’m guessing (without looking at any numbers) that a similar 99% of its small-medium size deployments are using Gitlab’s omnibus docker image (or the OS package, which is practically the same). They have a similar bootstrapping process, but based on chef within the container, that’s all executed on every startup and does the usual things like db migrations and asset compilation.

Yes, Gitlab’s startup can take several minutes due to this, but never has that been an issue for the deployments I’ve seen (some in larger companies). Especially with modern orchestration systems like docker-swarm and kubernetes and whatever else, which can run rolling upgrades for you, where the old instance is turned off only if the new instance is running and successfully health- and ready-checked, a lengthy deployment process might not actually be an issue. But even without fancy rolling upgrades, which may or may not work, you can also get away with quite a bit of downtime in many situations.

So: Is it be possible to configure launcher to skip the database-dependent operations during image build and instead do these operations during container startup?

I’m definitely willing to invest some time here myself, but my time in the evening is limited, so any pointers would be very welcome.

I’m also open to completely different processes if you think this is stupid or not even possible or so.

Thanks for any feedback!

5 Mi Piace

I wanted to do the same as you - we run discourse on Amazon ECS, so we needed to be able to build just the web image and push it to a registry. I didn’t fancy hacking the discourse build process because we want to stay as close as possible to the supported install.

Instead, we use the normal launcher script to build a two-container setup on a local machine, but ignore the data container and push the web container to the registry. At runtime we override the Postgres and Redis connection details via environment variables.

Deploying the new image is a 3-step process:

  1. Run the safe pre-migrations. Get ECS to run this command (with the new image):

     SKIP_POST_DEPLOYMENT_MIGRATIONS=1 rake db:migrate
    
  2. Deploy the new image. Update the ECS service.

  3. Run the post-migrations. Get ECS to run this command:

     SKIP_POST_DEPLOYMENT_MIGRATIONS=0 rake db:migrate
    

Having a local data container run while we build the image is probably wasteful, but it means we can use the standard web.template.yml without having to worry about which parts try to talk to the database or redis.

8 Mi Piace

Thanks for for that! I also figured I could just spin up a postgres during the image build and discard it once the build of the actual build is done.

2 Mi Piace

I finally took the time to implement this!

I implemented the image build using a gitlab-ci pipeline that runs postgres and redis as services during the build and discards them afterwards:

https://gist.github.com/pschichtel/2ca35ea87a0ad28a6caf4504660c4921

Now I only have to automated the deployment with the DB migrations

2 Mi Piace

Questa cosa è in esecuzione da oltre un anno senza mai toccarla, nemmeno per la versione 2.8.

2 Mi Piace

Ho spostato la build dell’immagine su GitHub: GitHub - pschichtel/discourse-docker: A reusable Discourse container built using the launcher tool.

L’immagine è pubblicata su pschichtel/discourse:stable-web_only

sembra che questo abbia finalmente causato un problema. durante l’aggiornamento da 3.0.6 a 3.1.0, non sono state eseguite migrazioni del database. Eseguire il comando finale bundle exec rake db:migrate all’interno del container in esecuzione ha funzionato, sebbene solo dopo un altro riavvio del container.

Devi migrare di nuovo quando la nuova immagine è stata avviata senza che quell’env fosse impostato. Esiste un rake task che lo farà, ma non riesco a ricordarlo né a trovarlo dal mio telefono. Qualcosa come ensure_post_migrations.

Per quel che vale, non ho notato alcun problema. Seguo principalmente la branch di rilascio beta e, per quanto ne so, le migrazioni sono state eseguite correttamente per ogni passaggio della serie 3.1.0.beta…

Ho trovato db:ensure_post_migrations tramite rake -AT.

Qual è la differenza tra db:migrate con SKIP_POST_DEPLOYMENT_MIGRATIONS=0 e db:ensure_post_migrations?

Ok, dopo aver esaminato il codice, ho capito cosa fa db:ensure_post_migrations. Dovrebbe essere utilizzato nella stessa esecuzione di rake prima di db:migrate per garantire che SKIP_POST_DEPLOYMENT_MIGRATIONS sia impostato su 0. Il mio script lo garantisce già:

il .gitlab-ci.yml:

./migrate.sh pre || echo "Redis non in esecuzione durante le pre-migrazioni, saltando..."
docker stack deploy --prune --resolve-image always -c "$STACK.yml" "$STACK"
./docker-stack-wait.sh -t 180 "$STACK"
./migrate.sh post

il migrate.sh:

#!/usr/bin/env sh

if [ "$(docker ps -q --filter "label=com.docker.stack.namespace=${STACK}" --filter "label=com.docker.swarm.service.name=${STACK}_${DISCOURSE_REDIS_HOST}" | wc -l)" = "0" ]
then
    echo "Nessun container redis trovato, impossibile eseguire le migrazioni!"
    exit 1
fi

if [ "$1" = "pre" ]
then
    skip_post=1
else
    skip_post=0
fi


docker run \
    --rm \
    --name "discourse-migration-${DISCOURSE_DB_HOST}-${DISCOURSE_DB_NAME}" \
    --network "${STACK}_discourse" \
    --workdir /var/www/discourse \
    -u discourse \
    -e SKIP_POST_DEPLOYMENT_MIGRATIONS="$skip_post" \
    -e LANG="${LANG}" \
    -e DISCOURSE_DEFAULT_LOCALE="${DISCOURSE_DEFAULT_LOCALE}" \
    -e DISCOURSE_HOSTNAME="${DISCOURSE_HOSTNAME}" \
    -e DISCOURSE_DEVELOPER_EMAILS="${DISCOURSE_DEVELOPER_EMAILS}" \
    -e DISCOURSE_SMTP_ADDRESS="${DISCOURSE_SMTP_ADDRESS}" \
    -e DISCOURSE_SMTP_PORT="${DISCOURSE_SMTP_PORT}" \
    -e DISCOURSE_DB_USERNAME="${DISCOURSE_DB_USERNAME}" \
    -e DISCOURSE_DB_PASSWORD="${DISCOURSE_DB_PASSWORD}" \
    -e DISCOURSE_DB_HOST="${DISCOURSE_DB_HOST}" \
    -e DISCOURSE_DB_NAME="${DISCOURSE_DB_NAME}" \
    -e DISCOURSE_REDIS_HOST="${DISCOURSE_REDIS_HOST}" \
    "$DISCOURSE_IMAGE" \
    bundle exec rake db:migrate

Esegue db:migrate con SKIP_POST_DEPLOYMENT_MIGRATIONS=1 nella nuova immagine sul docker swarm mentre discourse sta ancora eseguendo la vecchia versione. Quindi distribuisce la nuova immagine allo swarm e attende che converga. Alla fine, esegue nuovamente db:migrate, ma con SKIP_POST_DEPLOYMENT_MIGRATIONS=0.

Questo ha funzionato in modo affidabile per ogni versione per oltre 2 anni. Dato che ha funzionato per te @simonk, hai fatto qualcosa di fondamentalmente diverso rispetto al mio script?

1 Mi Piace

No, sto ancora seguendo lo stesso processo che ho delineato qui sopra, che per quanto ne so è più o meno lo stesso del tuo. Uso un semplice rake db:migrate invece di bundle exec rake db:migrate, ma non riesco a immaginare che ciò possa fare molta differenza.

Non ho mai usato docker stack o swarm. C’è qualche possibilità di un bug da qualche parte nei tuoi script che potrebbe causare allo script migrate.sh l’utilizzo della vecchia immagine invece di quella nuova?

Non l’ho controllato esplicitamente, ci darò un’occhiata. Lo swarm utilizzerà sicuramente l’immagine più recente, ma forse lo script CI per qualche motivo non ha utilizzato quella più recente.

Ho controllato ora con l’aggiornamento 3.1.1. In effetti, lo script CI utilizzava una versione precedente del container.