Why is "rebuild" so tightly coupled to container run status?

In fairness, I should start by saying I’m new to the platform and the codebase, and therefore have no idea how rebuild currently works under the hood. However, my current understanding is that rebuilding:

  1. stops the current container
  2. constructs a new container with data from the source tree
  3. waits for you to start the new container

From a DevOps-oriented perspective, why can’t the new container be built (perhaps in another branch or temporary directory) while the old one is still running? That seems like it would make swapping the new container for the old one a much faster process (at least in terms of downtime), perhaps on the order of seconds rather than minutes.

If the containers are using storage volumes that don’t get destroyed when the container is rebuilt, I’m not even sure that configuration or database changes (e.g. new messages) need to be handled specially in for this use case, meaning that container building shouldn’t be so tightly coupled to container status.

Is this simply an issue that no one has turned their attention to yet, or is there an existing architectural decision that requires one container to be stopped before another can be built?

4 Likes

Rebuild is a catch all update, which can:

  • Update Discourse source
  • Update OS level dependencies, like Ruby major verson
  • Update to newer and incompatible versions of PostgreSQL, where it takes care of updating the data disk format for the newer version
  • Update the Docker image. Just as an example, earlier this year we changed from Ubuntu 16.04 to last Debian and all is transparent to the user, just type ./launcher rebuild app.

Rebuilds aren’t necessary all the time, they are mandatory just a few times a year when a huge dependency update happens. For all other updates you can have 0 downtime updates clicking on the web updater in the admin UI.

For more “devops” points, you can try:

and much more at #howto:sysadmin

13 Likes

I may be wrong, but I feel @CodeGnome’s question about zero-downtime rebuilds still merits further investigation.

If I understand Docker correctly, the following aspects of Discourse could be rebuilt in the background in a new container while the existing container is still running:

  • Update Discourse source
  • Update OS level dependencies, like Ruby major verson
  • Update the Docker image. Just as an example, earlier this year we changed from Ubuntu 16.04 to last Debian and all is transparent to the user, just type ./launcher rebuild app .

Regarding breaking PostgreSQL changes, that’s more fiddly due to the volume, which, I assume, is shared between the old and new containers.

Perhaps the site could be put in read-only mode at the start of the rebuild, with the old container keeping its existing volume, and the newly-building database could operate within a new Docker volume?

1 Like

About updating discourse source, os level dependencies, the docker base image, ruby gems and the like, it’s possible to do that making the build in 2 steps, and run the aforementioned tasks in the 1st step.

This first step is agnostic to the environment and could be run even in a CI environment (so you could use an almost identical image in staging and production environments, avoiding possible errors due to rebuilding in different dates, not to mention the reduced downtime).

The db migration and assets:precompile tasks would still need to be run in the target machine. The db migration in most cases would be fast. On the other hand, the assets:precompile task is a problem because it is the step that takes more time. I think it’s because some assets need to know some environment stuff as defined in the db, like some css rules, to execute.

It would be very very great if this task were to be divided in 2 where all assets that don’t depend on the environment run first, and could be run in a CI environment, and in the 2nd step it would compile only assets that depend on stuff in the db, etc… That said, I don’t know how hard, technically, it would be to implement it.

I discuss about bootstrapping the app container in 2 steps in the following topic:

The changes I made were only about dividing the discourse web template in 3 files, but the tasks are the same, although it would be great if the discourse team supported it so that I wouldn’t need to update them due to future changes in the web template.

3 Likes