How to track down failure of discourse upstream inside container resulting in 502 Bad Gateway

(Yes, I did search first)

After recently using the admin upgrade interface, my discourse instance stopped working, responding with 502 Bad Gateway.

I’ve entered the container and it appears to be running an nginx that is expecting a server at localhost:3000, which is not running.

(54) Waiting for new unicorn workers under 3802725 to start up...
(54) Waiting for new unicorn workers under 3802725 to start up...
(54) Old pid is: 3800363 New pid is: 3802725
config/unicorn_launcher: line 71: kill: (3802725) - No such process
config/unicorn_launcher: line 15: kill: (3802725) - No such process
(54) exiting
ok: run: redis: (pid 62) 3418739s
ok: run: postgres: (pid 53) 3418739s
supervisor pid: 3803896 unicorn pid: 3803900
config/unicorn_launcher: line 71: kill: (3803900) - No such process
config/unicorn_launcher: line 15: kill: (3803900) - No such process
(3803896) exiting

this is followed repeatedly by:

ok: run: redis: (pid 64) 4905s
ok: run: postgres: (pid 65) 4905s
supervisor pid: 18571 unicorn pid: 18575
config/unicorn_launcher: line 71: kill: (18575) - No such process
config/unicorn_launcher: line 15: kill: (18575) - No such process
(18571) exiting

I’d like to start this thread for help in debugging this; what’s the next step here - which command is Discourse trying to run. (I know I could find this out by reading/reverse engineering the code, but it may be useful to have a thread on this on the forum.)

I’d be grateful for any pointers.

1 Like

Start with search :wink:

This looks similar?

Are you using a totally vanilla standard install?

Given the timing, this is most likely related to a data-explorer change which caused some issues. We’ve now reverted it, so if you try the rebuild again it should work better

3 Likes

Yes, I’m using data explorer. I didn’t do a git pull before restarting.
When I do a git pull, and then ./launcher restart app it’s not fixed.

Except that I’m running it behind an nginx on the host.
(And I have a few plugins, such as data explorer.)

I’m now trying ./launcher rebuild app - I hope that rebuilding the app will preserve my forum’s database … and I’m not ending up with my forum reset.
Doing launcher rebuild app does not address the issue.

This post describes an issue with privileged vs unprivileged containers, but doesn’t provide more information. It’s also from 2 years ago so may not be related to a recent update.

Sure, the database is on the mounted shared folder so persists.

Restarting the container after a git pull is unlikely to be enough.

Got it. I also ./launcher rebuild app - would that not pull updates to the plugins?

Yep, that will update the plugins too (so long as they are cloned within app.yml)

In case this is still being investigated I had a 502 gateway error but not directly after the update routine which failed midway with a Ruby versioning error. Since I had not updated the server in about six weeks I ran apt update/upgrade and rebooted. That’s when the 502 error occurred, I could not raise the forum website. Rebuilding app fixed things and also updated Discourse fully.

For the record I have these plugins installed and enabled:

discourse-bbcode
discourse-data-explorer
discourse-docs
docker_manager
styleguide

and these installed but disabled:

discourse-topic-list-previews
discpage

4 posts were split to a new topic: Is there a step by step diagnostic for when a Discourse site is found in a 502 Bad Gateway?