Skipped upgrades for several months since I needed (and had) a stable system for a few months.
So today I upgraded from something like v1.9.0.beta5 +42 (might have been this, from memory, skipped upgrading for a while not to end up in upgrade hell like this) to v1.9.0.beta15 +70.
A Docker upgrade was required and I followed the instructions here, basically doing steps like
apt-get update apt-get dist-upgrade wget -qO- https://get.docker.com/ | sh cd /var/discourse git pull ./launcher rebuild app
And now I ended up with 500 server errors, 502 ngix bad gateways errors after just a few minutes of system uptime or after half an hour… guessing it’s something specific I do though, possibly logging in different users using sso with something bad happening maybe (that is what I was working on at least).
Looked into the best log I could think of:
Which contained a lot of
Job exception: Connection timed out
and lines like
/var/www/discourse/vendor/bundle/ruby/2.4.0/gems/redis-3.3.5/lib/redis/connection/hiredis.rb:58:in `rescue in read'
So been doing a lot of
./launcher stop app ./launcher start app
for the past hours .
So, the next time things go down (and stay down apparently), any better places to look at trying to figure out what is going on?
FYI: Even though this might be far too much information, the exact issue I was investigating was using Firefox to login with two different users using sso. Even with a different external id they both overmapped each other each time they logged in (the Discourse user changed from one external id to the other). Both have the same email (on purpose). Several other temporary read-only users with different external id’s and the same email work just fine. These two, seem to map based on email instead of external id. Sso overrides is off etc.Really fishy behaviour which I of course have to investigate in case it could happen to an admin user.
Of course tomorrow Finland is celebrating it’s 100 year independence day so this was happening just at the right time…