Is there a step by step diagnostic for when a Discourse site is found in a 502 Bad Gateway?

I came here hoping to find a step by step diagnostic for when a Discourse site is found in a 502 Bad Gateway condition. It seems the only options are along these lines:

  1. Discourse update might have failed, use ./launcher rebuild app.
  2. Update and reboot server.

These are the kind of responses we get from a tier-1 support tech, or an email bot.

What else can we do to look at logs and see exactly why the environment died? With that info we might learn how to prevent the issue in the future.

For example, would it be appropriate to script a cron process to occasionally ping Discourse, and if the response is a 502 or similar return code, auto rebuild?

Rebuilding seems to be a rather brutal way to solve a problem too. It’s not a diagnostic.

I’m really hoping someone can point us to a popular “Diagnosing Discourse Issues” document that dummies like me have missed. :slight_smile:

Thanks!

From reading a lot of posts here, typically forum admins aren’t the cause of 502s, and it’s a plugin/core error. So there wouldn’t be particularly much you could do to avoid those issues.

Console logs always help, they can pinpoint the problematic plugin a lot of times.

3 Likes

I can open the console on this VPS but the text window is limited.
Are there specific logs that can be checked in the container or in the OS?
Is there already some form of ping process in the host OS or the container that detects when processes are down?
Might a simple server restart within the container be a good way to approach this rather than a full rebuild?

BTW, I am running the latest beta/dev, so it’s entirely possible that a recent update took the server down, as we’ve seen in the past. I don’t recall at the moment if there are any non-default plugins installed.

I have the freedom to help with the diagnostics of this without our community getting upset, though within some number of months we’ll need to move to more stable versions just to keep our users comfortable. So if this is something in the build, I’m happy to help find it.

Thanks!

I meant browser logs, from dev tools or the equivalent on your browser.

I don’t believe so, but you could always try.

Is the disk full?

Does this happen frequently?

Look at /var/discourse/logs/rails/production.log

4 Likes

Sorry it’s taken so long to get back here…

Disk is <50% in use.
RAM tends to remain in 80-90 range, Swap <40%. I’m guessing this is where the issue is caused.
Logs are in /var/discourse/shared/standalone/log/rails.
production.log and related gzipped files have a lot of transaction detail. What might I look for?
There are no production_error.log entries at all.
“Frequently”? No. But often enough to be mildly annoying and prompt a post here. :slight_smile:
I went through syslog and didn’t see anything - not sure there would be anything there if issue is restricted to container, as it should be.

I’m a Docker noob, so I’m sorry that I have no info from the container, but will be happy to poke as directed.

Thanks!

This won’t help. The back end is the issue here. It’s not getting as far as getting a response from the server (hence “bad gateway”)

it’s the backend rails logs you need to look at.

Try the actions:

  • /var/discourse/shared/standalone/log/rails# tail -n 200 production.log to see if there are obvious startup errors

  • in the container (first ./launcher enter app):

    curl 0.0.0.0:3000 to see if the rails server is responding.

Other than that remove all plugins, rebuild and then iteratively add them back.

1 Like

502 happens when rails doesn’t return a response, usually when the system is booting up or something is misconfigured.

You might look in the nginx logs

I think almost all of the threads here about 502 errors are when Discourse has been upgraded and it hasn’t come back to life. The upgrade failed, or the admin didn’t wait long enough for the service to come up.

Are you saying that you have a working Discourse, take no admin action, but it starts returning 502 spontaneously?

And when it does that, does it always return 502 until restarted or is it intermittently working again?