Help with troubleshooting after upgrade to 2.3.0

Hello,

I have been having some trouble since last night with my Discourse setup. I would appreciate if someone could help me troubleshoot.

Timeline:

  1. Users report can’t access the site. I can reproduce 100%. When logging into my DO droplet (ubuntu lts 16.04 x64) I can see the OS is asking for a reboot (never happened before). Rebooted and back (in my PC) to regular service
  2. As I was already offline I took the opportunity to upgrade Discourse. I rebuilt to latest (2.3.0 beta2) and everything seemed to get back to work (Safari on Mac)
  3. I noticed that docker-engine was deprecated so I uninstalled and installed docker-ce. Everything working fine.
  4. Hours later users report issues which I can’t reproduce, until I start using some combinations:
  • Works on FF + Win
  • Does not work on Chrome + Win
  • Works on ios + Safari + wifi
  • Does not work on ios + Safari + 4g
    All very weird as you can see
  1. I see that all the logos are gone, which I notice after seeing errors in logfile:

There some other quite strange errors in the log as well

I can see that this is a known issue so I proceed to reupload logos and everything seems to go back to normal, nearly.

Now Chrome + Win works but not any of the others. IE returns a 504 which some users can see as well. In the combinations which work the site loads quickly as ever.

Some other weird problems I’ve noticed is Firefox complaining about the certificate (Let’s Encrypt) but Chrome being fine.

EDIT: The certificate seems to be fine, for some reason I noticed FF was reporting I had added an exception, which I have no memory of having done. Once removed green padlock again…

I know this is loose an open-ended, but where would you advise that I start? I would say that the 504 problem is the most concerning of all as I suspect that one explains the non access problems.

Many thanks indeed.

Arturo

If you created your droplet before they reduced prices you can resize to a 2gb droplet for the same $10/month you’re paying now.

How long since the last upgrade? Recently there was a change that required images to be reprocessed. That might be related to the 504 errors.

How are ram and disk space? Cpu load?

4 Likes

Hello,

Thanks for the swift reply.

I was already on 2 GB RAM.

Last upgrade (to Discourse) was 6 hours ago approximately. I resized the droplet months ago with no issues.

CPU and disk i/o look fine to me

I don’t have access to the console to check disk space and ram now, unfortunately, but I don’t think that is the problem as it presents consistently. When the site works it does work all the time (combination of user and environment) when it doesn’t, it doesn’t at all. For some reason it seems to be related to the connection (once I reloaded the logos, that is)

You can try

 ./discourse-doctor
2 Likes

Will do that. Would you reckon any chance of problem with nginx? How should I go about that? Might it be caching like forever and hence users see site persistently down consistently if they tried to access at a certain time and it was down?

Many thanks again.

If you did a standard install it’s not likely a problem with nginx.

1 Like

OK, thanks, I have filed an issue with DigitalOcean as well, just in case there is some relationship. As a matter of fact they did some network changes yesterday. I would be surprised they broke something and went unnoticed for this long but who knows. As I’m saying the bit making me so suspicious is that the problem seems to be user connection related.

So as it turns out, our domain registration expired. Embarrassing I know. Fortunately we have been able to rescue the situation. Apologies for the time waste and thank you for your help.

5 Likes

No worries thanks for the detailed troubleshooting, I wish more meta support posts were this thorough — glad you were able to work it out.

2 Likes