Don't forget to run docker remove app when reinstalling discourse

I have been using Discourse for several years now. I am setting up a new instance every 6 months. My setup includes docker and an nginx-based proxy, so it’s perhaps slightly non-standard. I’m not using discourse-setup for this reason.

Every 6 months, when I repeat this process, after I reclone a fresh copy of discourse from its git and run ./launcher bootstrap app, the container fails to start. The log shows:

anacron: Can't chdir to /var/spool/anacron: No such file or directory
run-parts: /etc/runit/1.d/anacron exited with return code 1
run-parts: executing /etc/runit/1.d/00-ensure-links
run-parts: executing /etc/runit/1.d/00-fix-var-logs
run-parts: executing /etc/runit/1.d/01-cleanup-web-pids
run-parts: executing /etc/runit/1.d/anacron
anacron: Can't chdir to /var/spool/anacron: No such file or directory

ad infinitum.

I then usually do a number of steps to rebootstrap, restart, removing plugins, readding them, etc., until it finally works, without ever learning what ultimately made it work. 6 months later, the same thing happens again. I work just to fix it, and it’s not clear which of the many steps I take then ultimately made it work.

This time, though, I believe I finally found the issue, and it’s this: apparently, ./launcher start app restarts old container instances called app, even when Discourse was recloned and rebootstrapped.

The missing step is docker remove app. In summary:

./launcher stop app
docker remove app
... now reclone, rebootstrap and launcher start app works

My mistake was to expect that after running ./launcher bootstrap app, the next ./launcher start app would start the new container image, but this doesn’t appear to be the case. Naturally, things go haywire with the old one since the /var/discourse/shared path has been reinitialized.

I’m leaving this here in case others search for the same log error messages.

As a possible improvement, it would be nice if the container detected that its /var/discourse/shared directory changed.

2 Likes

If you want to run bootstrap, the “discourse way” is

./launcher bootstrap app
./launcher destroy app
./launcher start app

But if you have just one container, there’s no reason not to just

./launcher rebuild app

like pretty much every example says. That stops the running container, bootstraps a new one, and starts it. If the bootstrap fails for some reason, you can (usually) restart the old one with ./launcher start app (as you’ve described).

I think I see the problem, and it’s related to the usual conflation of “container instance” and “container image.”

If you look at 10. Post-Install Maintenance, for instance, it says:

Usage: launcher COMMAND CONFIG [--skip-prereqs] [--docker-args STRING]
Commands:
    start:      Start/initialize a container
    stop:       Stop a running container
    restart:    Restart a container
    destroy:    Stop and remove a container
    enter:      Use nsenter to get a shell into a container
    logs:       View the Docker logs for a container
    bootstrap:  Bootstrap a container for the config based on a template
    rebuild:    Rebuild a container (destroy old, bootstrap, start new)
    cleanup:    Remove all containers that have stopped for > 24 hours

In most uses of the word “container” in this help output, it’s referring to an instance of a container. Except in bootstrap, where it refers to an image. (./launcher bootstrap uses docker commit to create a new image from which subsequent container instances can be launched.) I think this was unexpected (and I naively assumed it would affect the current app instance as well.)

In rebuild, the word container refers to both container images and instances since it involves a set of operations that affects both container instances and container images.

And it’s not clear what it refers to in cleanup - will only the instances be removed, or the bootstrapped image as well?

1 Like