NOTE: Original post updated 11/25/21 PM EST with new information
Notified of critical security updates to my Discourse installation I attempted to update my installation using the Web UI (/admin/upgrade) as I have done in the past. There were two pieces of software that needed to be upgraded—Docker Manager and Discourse.
The Docker Manager had to be upgraded first (the Discourse upgrade button was disabled). I started the Docker Manager upgrade using the Web UI and it completed successfully. I then started the Discourse upgrade but it failed midway through. When I refreshed the Web UI I saw the following message:
So, following the onscreen instructions, I SSH’d into the server, did a git pull
and then ran sudo ./launcher rebuild app
from the command line. The process finished but failed with a FAILED TO BOOTSTRAP error message.
Here is the output of two runs of sudo ./launcher rebuild app
at different times:
- launcher-rebuild-app-output-0.txt - Lines 88-95
- launcher-rebuild-app-output-1.txt - Lines 100-107
The line numbers after each file are where the only ERRORs appear. Both appear to be database and role related (the difference between both ranges is because the second attempted a git pull
from the discourse/base
repository).
2021-11-25 21:21:38.451 UTC [64] postgres@postgres ERROR: database "discourse" already exists
2021-11-25 21:21:38.451 UTC [64] postgres@postgres STATEMENT: CREATE DATABASE discourse;
createdb: error: database creation failed: ERROR: database "discourse" already exists
I, [2021-11-25T21:21:38.454429 #1] INFO -- :
I, [2021-11-25T21:21:38.454908 #1] INFO -- : > su postgres -c 'psql discourse -c "create user discourse;"' || true
2021-11-25 21:21:38.531 UTC [68] postgres@discourse ERROR: role "discourse" already exists
2021-11-25 21:21:38.531 UTC [68] postgres@discourse STATEMENT: create user discourse;
ERROR: role "discourse" already exists
This appears to dovetail with the FAILED error message displayed at the bottom of each Launcher Rebuild attempt.
FAILED
--------------------
Pups::ExecError: cd /var/www/discourse && su discourse -c 'bundle exec rake db:migrate' failed with return #<Process::Status: pid 436 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params {"cd"=>"$home", "hook"=>"db_migrate", "cmd"=>["su discourse -c 'bundle exec rake db:migrate'"]}
13bbdd52e0835ba9dfddc5c367d63b6087a16553c3a77d27ca307734d6e16907
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one.
./discourse-doctor may help diagnose the problem.
Note: These ERRORS are not the root problem. See “Solution” below.
Some people below have said that there is an issue with redis
that is preventing a successful rebuild.
I ran `sudo ./discourse-doctor at various times during the day. Here is the output from two of the runs:
- discourse-doctor-output-0.txt - could not find ‘app’ container running; attempted rebuild but failed to restart the container
- discourse-doctor-output-1.txt - ‘app’ started manually with
sudo usr/bin/docker start app
before runningdiscourse-doctor
I verified that my Docker installation was running correctly by running sudo docker run -it --rm hello-world
Unable to find image 'hello-world:latest' locally
latest: Pulling from library/hello-world
2db29710123e: Pull complete
Digest: sha256:cc15c5b292d8525effc0f89cb299f1804f3a725c8d05e158653a563f15e4f685
Status: Downloaded newer image for hello-world:latest
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
(amd64)
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
I ran sudo ./launcher cleanup
to make sure I had enough disk space.
WARNING! This will remove all images without at least one container associated to them.
Are you sure you want to continue? [y/N] y
Deleted Images:
<DETAILS REMOVED>
Total reclaimed space: 3.836GB
$ df -hT /dev/xvda1
Filesystem Type Size Used Avail Use% Mounted on
/dev/xvda1 ext4 30G 9.1G 20G 32% /
And I even checked my memory settings.
$ free -h
total used free shared buff/cache available
Mem: 1.9G 304M 633M 20M 1.0G 1.5G
Swap: 2.0G 0B 2.0G
A reboot of the server did not solve the issue but I did notice something interesting after rebooting the server.
The Docker app
container is running after a reboot.
$ sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6449ec0061a0 local_discourse/app "/sbin/boot" 7 weeks ago Up 25 seconds 0.0.0.0:80->80/tcp, :::80->80/tcp, 0.0.0.0:443->443/tcp, :::443->443/tcp app
But when I go to the site I get a “502 Bad Gateway” error.
When I stop the app
container and go to the site I get a “Unable To Connect” error (which seems right since the container isn’t running).
But this puzzles me since I don’t have Nginx
installed on this server.
I can see in the Rebuild output where the process is copying Nginx files from one location to another but I cannot find the corresponding directories or files, specifically nginx.conf
on my server anywhere. Ubuntu, Docker, and Discourse are not my primary skills but I am assuming that these files are being copied “within” the Docker app
container.
Thanks in advance; appreciate any additional help or direction with this issue, which seems to surface from time to time during Discourse upgrades.
UPDATE: It turns out my assumption regarding the Docker app
container having its own internal filesystem is correct. You can create a snapshot of the container filesystem and explore this filesystem using bash.
# create image (snapshot) from container filesystem
$ sudo docker commit <container_id> mysnapshot
$ sudo docker run -t -i mysnapshot /bin/bash
In the app
filesystem there is an nginx
directory that contains a Discourse configuration file.
root@f91826d986eb:/etc/nginx/conf.d# ls -l
total 12
-rw-r--r-- 1 root root 10568 Oct 3 21:33 discourse.conf