Admin dashboard won’t load after upgrade to v2.2.0.beta3 +36

fearlessfrog · October 16, 2018, 7:10pm

We recently updated to “v2.2.0.beta3 +36 tests-passed” and have started getting /admin/dashboard/general.json Server 502 errors.

On a quick search it looks very similar to this one:

…which was solved by @j.jaffeux but no solution given in the closed topic.

Is there anything diagnostic wise I can gather to help out on this, or any recent changes in the area meaning I should just try updating to a more recent?

Thanks for any help. This just seems isolated to the /admin otherwise we’re doing great.

fearlessfrog · October 16, 2018, 7:29pm

Nothing of note in the logs, just the 502 from the admin .json, which looks to be a timeout:

/var/discourse/shared/standalone/log/rails# more unicorn.stderr.log
I, [2018-10-16T05:13:36.016595 #80] INFO -- : master done reopening logs
I, [2018-10-16T05:13:36.051096 #206] INFO -- : worker=0 done reopening logs
I, [2018-10-16T05:13:36.071959 #253] INFO -- : worker=2 done reopening logs
I, [2018-10-16T05:13:36.079130 #225] INFO -- : worker=1 done reopening logs
I, [2018-10-16T05:13:36.082894 #1264] INFO -- : worker=3 done reopening logs
E, [2018-10-16T17:02:00.830572 #80] ERROR -- : worker=1 PID:225 timeout (31s &gt; 3
0s), killing
I, [2018-10-16T17:02:08.596101 #26765] INFO -- : worker=1 ready
E, [2018-10-16T17:10:47.306347 #80] ERROR -- : worker=2 PID:253 timeout (31s &gt; 3
0s), killing
From https://github.com/discourse/discourse
e3c6dd2..b23ebf1 tests-passed -&gt; origin/tests-passed
e3c6dd2..b23ebf1 master -&gt; origin/master
 * [new branch] svg-icons -&gt; origin/svg-icons
I, [2018-10-16T17:10:54.901712 #27709] INFO -- : worker=2 ready
D, [2018-10-16T17:11:17.368681 #80] DEBUG -- : waiting 16.0s after suspend/hiber
nation
E, [2018-10-16T18:51:14.634343 #80] ERROR -- : worker=3 PID:1264 timeout (31s &gt; 
30s), killing
I, [2018-10-16T18:51:22.141804 #3833] INFO -- : worker=3 ready
D, [2018-10-16T18:51:44.697230 #80] DEBUG -- : waiting 16.0s after suspend/hiber
nation
E, [2018-10-16T19:07:10.412676 #80] ERROR -- : worker=0 PID:206 timeout (31s &gt; 3
0s), killing
I, [2018-10-16T19:07:17.781331 #5188] INFO -- : worker=0 ready
E, [2018-10-16T19:16:05.922215 #80] ERROR -- : worker=2 PID:27709 timeout (31s &gt;
 30s), killing
I, [2018-10-16T19:16:13.599249 #6114] INFO -- : worker=2 ready
E, [2018-10-16T19:20:28.217211 #80] ERROR -- : worker=2 PID:6114 timeout (31s &gt; 
30s), killing
I, [2018-10-16T19:20:35.408446 #6574] INFO -- : worker=2 ready
D, [2018-10-16T19:20:58.285287 #80] DEBUG -- : waiting 16.0s after suspend/hiber
nation

Resources seems all ok?

>df -h
Filesystem                 Size  Used Avail Use% Mounted on
udev                       2.0G  4.0K  2.0G   1% /dev
tmpfs                      396M  408K  395M   1% /run
/dev/disk/by-label/DOROOT   79G   15G   60G  20% /
none                       4.0K     0  4.0K   0% /sys/fs/cgroup
none                       5.0M     0  5.0M   0% /run/lock
none                       2.0G  1.7M  2.0G   1% /run/shm
none                       100M     0  100M   0% /run/user
none                        79G   15G   60G  20% /var/lib/docker/aufs/mnt/b834156bb92ed60ad3d0683540e279fb90f980abb7625247e43db83f7a3cb640
shm                        512M  8.0K  512M   1% /var/lib/docker/containers/1ccedc3b8178735b0e091bbc1f42bbfdc6ba03b2676e81a21f210ff178c12d70/shm

>free -h
             total       used       free     shared    buffers     cached
Mem:          3.9G       3.7G       193M       1.0G        50M       1.5G
-/+ buffers/cache:       2.2G       1.7G
Swap:         2.0G        29M       2.0G

All of these paths work ok, it’s just the main /admin page (plus the moderation page is ok too):

admin/reports/signups
admin/reports/topics
admin/reports/posts
admin/reports/dau_by_mau
admin/reports/daily_engaged_users
admin/reports/new_contributors
admin/reports/top_referred_topics
admin/reports/trending_search

fearlessfrog · October 16, 2018, 10:21pm

It seems like a pure timing issue, in that we up’d the timeout in

config/unicorn.conf.rb

from 30 to 60 seconds temporarily and now we can see the admin data ok.

As we’ve been running a few years, is there something we can optimize in our set-up to avoid the data taking a while to fetch? Our next stop is to pg analyze to see if a query is causing this, i.e. a missing index or something.

codinghorror · October 17, 2018, 12:20am

Something for @j.jaffeux to look at maybe?

j.jaffeux · October 17, 2018, 12:22am

Hi,

nothing in /logs ? Don’t have much hope about this as charts are supposed to be ultra resilient now. So it’s probably coming from something else like disk-space, backups, or dashboard problems.

fearlessfrog · October 17, 2018, 12:30am

Hi,

Nothing in log/rails/production_error.log and just the timeout (as put above in this topic) in unicorn.stderr.log regarding the timeout.

It’s like the dashboard now takes about 35 seconds or so, so unicorn is killing it before it’s done. It only started happening on the recent update.

As put in the posts above, resources all look ok. A 2 vCPU DigitalOcean instance with 4GB memory, db_shared_buffers: “1024MB” and UNICORN_WORKERS: 4

Only other thing I thought of trying is db_work_mem has been left as default, so we could up that - do you think it would help the dashboard queries?

The forum performance is generally great, no issues, it’s just this dashboard page since the update this week.

I could update again, but thought it best to keep this config if it helped you track anything down.

j.jaffeux · October 17, 2018, 12:31am

I was meaning in « www.example.com/logs » but if you are sure you didnt miss anything ok.

fearlessfrog · October 17, 2018, 12:32am

Yep, they are good, in that no entries when causing this to 502.

fearlessfrog · October 17, 2018, 12:47am

Thanks Jeff, Joffrey has reached out via PM and we’re setting up some diagnostics to see what’s up. Cheers.

jerdog · October 22, 2018, 1:35pm

Throwing my hat into the ring as I am seeing the exact same issue since /admin/upgrade was performed… Still happens in Safe Mode with everything disabled.

jerdog · October 22, 2018, 6:03pm

@j.jaffeux - let me know what you need from me to see what’s going on

sam · October 22, 2018, 10:48pm

What plugins do you have? Remove all but the official plugin, then do a rebuild.

cd /var/discourse
./launcher rebuild app

jerdog · October 23, 2018, 12:26am

I have all official except for Who’s Online.

fearlessfrog · October 23, 2018, 12:43am

We reverted just back to official plugins last week only but with the same result (no who’s online plugin etc).

I’m awaiting for @j.jaffeux to ping us back and haven’t wanted to change our environment before he can take a look, as I know recreating these things can sometimes be difficult. If I try a bunch of things then I might fix it but then we lose the opportunity for a common solution to make it back in the product.

Because our workaround is just a longer unicorn timeout then there isn’t a great urgency to try things to fix. My best guess would be there is some log or DB table that needs to be pruned by an update, and the queries for the dashboard just take a long time to complete (more than 30 seconds, so the /admin gets killed by Unicorn timeout). Either than or some sort of Postgres buffer tuning that #meta has that the stand-alone installs don’t have configured.

jerdog · October 23, 2018, 1:09am

What/where did you tune for unicorn? I can make the adjustment until @j.jaffeux is able to figure this out.

fearlessfrog · October 23, 2018, 1:35am

There’s an ENV for it here:

container/app.yml

env:
UNICORN_TIMEOUT: 60

https://github.com/discourse/discourse/blob/master/config/unicorn.conf.rb#L36

jerdog · October 23, 2018, 2:46am

Hmm yeah that didn’t work for me…

fearlessfrog · October 23, 2018, 3:33am

Our issues might be different then? Just to be more explicit on steps:

1 - Edit containers/app.yml via ssh, placing a new UNICORN_TIMEOUT: 60 line under the env section

2 - ./launcher rebuild app

jerdog · October 23, 2018, 12:54pm

Yeah that’s what I did…

env:
  ## How many concurrent web requests are supported?
  ## With 2GB we recommend 3-4 workers, with 1GB only 2
  UNICORN_WORKERS: 4
  UNICORN_TIMEOUT: 60

j.jaffeux · October 23, 2018, 4:38pm

I have some big work to finish for tomorrow. Will have a look at this right after.

Topic		Replies	Views
Admin dashboard won't load after upgrade to v2.1.0.beta3 +20 Installation	32	1869	August 7, 2018
After latest admin/upgrade my Admin dashboard won't load Support	9	2136	July 20, 2021
After update to 2.1.0 admin dashboard have errors and not load Bug	16	1114	September 15, 2018
Admin functions Support	9	146	February 26, 2025
Disk usage spike during backup, Discourse crashed hard :-( Installation server-resources	21	2555	July 23, 2020

Admin dashboard won’t load after upgrade to v2.2.0.beta3 +36

Related topics