A Discourse server I’m trying to upgrade is stuck restarting (and due to the constant restarts I can’t seem to be able to get a shell on it), this is what I have from ./launcher logs app:
[Fri 29 Mar 12:51:47 UTC 2019] Run reload cmd: sv reload nginx
warning: nginx: unable to open supervise/ok: file does not exist
[Fri 29 Mar 12:51:47 UTC 2019] Reload error for :
nginx: [error] open() "/run/nginx.pid" failed (2: No such file or directory)
run-parts: /etc/runit/1.d/letsencrypt exited with return code 1
Done any have any suggestions regarding what I could try next?
The rebuild seems to go OK but then the container keep restarting and I can’t get a shell on the container because of this — I want to see if /run exists — the error logs above appear to be saying that this directory doesn’t exist…
After a rebuild with all the plugins disabled it looks OK to start with:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5185f5987314 local_discourse/app "/sbin/boot" 11 seconds ago Up Less than a second 0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp app
But then get’s stuck in a loop that looks like this:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5185f5987314 local_discourse/app "/sbin/boot" About a minute ago Restarting (100) 39 seconds ago app
I’m tried another rebuild with this to the end of containers/app.yml:
## Any custom commands to run after building
run:
- exec: echo "Beginning of custom commands"
- exec: ls -lah /run
- exec: df -h
- exec: file /run/nginx.pid
The results of this were:
I, [2019-03-29T13:29:39.736562 #14] INFO -- : total 36K
drwxr-xr-x 1 root root 4.0K Mar 29 13:23 .
drwxr-xr-x 1 root root 4.0K Mar 29 13:23 ..
lrwxrwxrwx 1 root root 25 Feb 22 10:06 initctl -> /run/systemd/initctl/fifo
-rw-r--r-- 1 root root 0 Mar 21 00:43 init.upgraded
drwxrwxrwt 3 root root 4.0K Feb 22 10:06 lock
drwxr-xr-x 2 root root 4.0K Feb 22 10:06 log
drwxr-xr-x 2 root root 4.0K Feb 22 10:05 mount
lrwxrwxrwx 1 postgres postgres 20 Mar 29 13:23 postgresql -> /shared/postgres_run
drwxr-xr-x 2 root root 4.0K Feb 22 10:06 sendsigs.omit.d
lrwxrwxrwx 1 root root 8 Feb 22 10:06 shm -> /dev/shm
drwx--x--x 3 root root 4.0K Mar 21 00:42 sudo
drwxr-xr-x 1 root root 4.0K Mar 21 00:42 systemd
drwxr-xr-x 2 root root 4.0K Feb 22 10:06 user
-rw-rw-r-- 1 root utmp 0 Feb 22 10:05 utmp
I, [2019-03-29T13:29:39.737355 #14] INFO -- : > df -h
I, [2019-03-29T13:29:39.742286 #14] INFO -- : Filesystem Size Used Avail Use% Mounted on
overlay 64G 45G 16G 74% /
tmpfs 64M 0 64M 0% /dev
tmpfs 1.5G 0 1.5G 0% /sys/fs/cgroup
/dev/vda2 64G 45G 16G 74% /shared
shm 512M 8.0K 512M 1% /dev/shm
tmpfs 1.5G 0 1.5G 0% /proc/acpi
tmpfs 1.5G 0 1.5G 0% /sys/firmware
I, [2019-03-29T13:29:39.743254 #14] INFO -- : > file /run/nginx.pid
I, [2019-03-29T13:29:39.755033 #14] INFO -- : /run/nginx.pid: cannot open `/run/nginx.pid' (No such file or directory)
So the issue does appear to be that /run/nginx.pid doesn’t exist and I expect that cause of this is a problem with the Nginx config? So perhaps I’ll check that now…
And I’m currently trying to work out if this commit broke the Nginx config:
I can’t see anything in the app.yml file that could affect the Nginx config, and the YAML appears to be valid, I just tested it with yamllint containers/app.yml and the only issues were with whitespace and line length.
To try to shed some light on the problem:
## Any custom commands to run after building
run:
- exec: echo "Beginning of custom commands"
- exec: ls -lah /etc/nginx/
- exec: /usr/sbin/nginx -t -c /etc/nginx/nginx.conf
- exec: /usr/sbin/nginx -t -c /etc/nginx/letsencrypt.conf
- exec: /usr/sbin/nginx -t -c /etc/nginx/fastcgi.conf
And this resulted in:
I, [2019-03-29T14:28:53.182369 #14] INFO -- : > /usr/sbin/nginx -t -c /etc/nginx/nginx.conf
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
Which looks good, but then:
I, [2019-03-29T14:28:53.209517 #14] INFO -- : > /usr/sbin/nginx -t -c /etc/nginx/letsencrypt.conf
nginx: the configuration file /etc/nginx/letsencrypt.conf syntax is ok
nginx: [emerg] socket() [::]:80 failed (97: Address family not supported by protocol)
nginx: configuration file /etc/nginx/letsencrypt.conf test failed
I, [2019-03-29T14:28:53.218977 #14] INFO -- :
...
FAILED
--------------------
Pups::ExecError: /usr/sbin/nginx -t -c /etc/nginx/letsencrypt.conf failed with return #<Process::Status: pid 2355 exit 1>
Location of failure: /pups/lib/pups/exec_command.rb:112:in `spawn'
exec failed with the params "/usr/sbin/nginx -t -c /etc/nginx/letsencrypt.conf"
f42e20c75dda3fa8b80905d1ad3159a30ca16aea69bc2a7f392cf566f33e02da
** FAILED TO BOOTSTRAP ** please scroll up and look for earlier error messages, there may be more than one
So the issues is with the IPv6 address? The server does have the following in /etc/sysctl.conf — the VM isn’t setup for IPv6:
So perhaps I need to disable IPv6 for Nginx and or Docker?
Or perhaps sed can save the site?
## Any custom commands to run after building
run:
- exec: echo "Beginning of custom commands"
- exec: sed -e 's/listen \[::\]:80;/#listen [::]:80;/' -i /etc/nginx/letsencrypt.conf
- exec: /usr/sbin/nginx -t -c /etc/nginx/letsencrypt.conf
But this virtual server don’t have IPv6 configured and it appears that as a result the IPv6 line in /etc/nginx/letsencrypt.conf:
listen [::]:80;
Is the thing that is stopping Nginx from starting — I’m not sure what solution you are suggesting, my solution appears to work:
I, [2019-03-29T14:57:26.021690 #14] INFO -- : Beginning of custom commands
I, [2019-03-29T14:57:26.023075 #14] INFO -- : > sed -e 's/listen \[::\]:80;/#listen [::]:80;/' -i /etc/nginx/letsencrypt.conf
I, [2019-03-29T14:57:26.030390 #14] INFO -- :
I, [2019-03-29T14:57:26.030529 #14] INFO -- : > /usr/sbin/nginx -t -c /etc/nginx/letsencrypt.conf
nginx: the configuration file /etc/nginx/letsencrypt.conf syntax is ok
nginx: configuration file /etc/nginx/letsencrypt.conf test is successful
I, [2019-03-29T14:57:26.045517 #14] INFO -- :
I agree that ideally the VM server and the DNS would be configured for IPv6, but this isn’t the case with this server and I might not be the only person still running a server using IPv4 so it appears fairly reasonable to me to disable IPv6 in the Nginx config in the Docker container, it is a simply one line solution in containers/app.yml and it does the job for now:
## Any custom commands to run after building
run:
- exec: echo "Beginning of custom commands"
- exec: sed -e 's/listen \[::\]:80;/#listen [::]:80;/' -i /etc/nginx/letsencrypt.conf
As far as I’m concerned this “solution” is good enough for now…
Oops! No. I failed to note that! I think that the take-home message is that if you don’t want to deal with ipv6 (I’ve not learned much about it myself) you can leave out anything to do with ipv6 in /etc/sysctl.conf and you’ll be fine.