Discourse unavailable with high load average

JohnDoux · September 3, 2018, 7:33am

Hello everyone,

I have been running an instance of Discourse for the last 2 years with Digital Ocean without any problem but since the last 2 weeks, it became very slow or unavailable most of the time.

I have upgraded my hosting to the following one, it’s a bit better but still facing the same problems:

Ubuntu 14.04 x64
2 vCPUs
4GB / 80GB Disk

This is the output of the htop command:

Any idea where to start fixing this problem? Thanks a lot for your help.

mpalmer · September 3, 2018, 8:38am

What’s the %iowait? In general, htop is a poor substitute for top for actual diagnostic purposes.

JohnDoux · September 3, 2018, 8:42am

top - 04:41:15 up 1 day, 25 min,  1 user,  load average: 6.32, 7.70, 6.89
Tasks: 162 total,   4 running, 158 sleeping,   0 stopped,   0 zombie
%Cpu(s): 64.2 us, 19.9 sy,  0.0 ni,  6.9 id,  7.7 wa,  0.0 hi,  1.3 si,  0.0 st
KiB Mem :  4048268 total,   116444 free,  2742964 used,  1188860 buff/cache
KiB Swap:  1048572 total,   451544 free,   597028 used.    68040 avail Mem

mpalmer · September 3, 2018, 8:49am

Given the 1m load average is nearly half what it was in the previous screenshot, and %iowait is still pretty high, I’d say you’re probably I/O constrained, likely due to swapping. Confirm with sar -W 1 whilst the system is at peak unhappiness, and then upgrade RAM.

JohnDoux · September 3, 2018, 8:54am

Thanks Matt,

Here is the output of the sar -W 1 (I have no idea what it is )

04:52:49 AM  pswpin/s pswpout/s
04:52:50 AM     95.00      2.00
04:52:51 AM    166.00      0.00
04:52:52 AM     40.00      0.00
04:52:53 AM     19.00      0.00
04:52:54 AM     74.00      0.00
04:52:55 AM    125.00      0.00
04:52:56 AM    247.00      0.00
04:52:57 AM    215.84      2.97
04:52:58 AM     70.00      0.00
04:52:59 AM    334.00      0.00
04:53:00 AM    390.00      0.00
04:53:01 AM    568.00      6.00
04:53:02 AM    702.00      0.00
04:53:03 AM   1047.52      5.94
04:53:04 AM    416.00      0.00
04:53:05 AM    449.00      0.00
04:53:06 AM    691.00      0.00
04:53:07 AM    772.00      6.00
04:53:08 AM    550.00      0.00
04:53:09 AM    181.00      0.00
04:53:10 AM    476.00      0.00
04:53:11 AM    348.00      0.00
04:53:12 AM    316.00      4.00
04:53:13 AM    454.00      0.00
04:53:14 AM    356.00      0.00
04:53:15 AM    911.88      7.92
04:53:16 AM    262.00      0.00
04:53:17 AM    303.00      0.00
04:53:18 AM    271.00      0.00
04:53:19 AM    284.00      6.00

mpalmer · September 3, 2018, 9:06am

The headings should give you a hint… “pages swapped in per second” and “pages swapped out per second”. Each page is (usually) 4k, so swapping in 1,000 pages in a second means you’re copying about 4MB of data off disk and into RAM. That’s gonna take a while.

You’re definitely due for a RAM upgrade, or doing something to reduce RAM usage (although your unicorns aren’t exactly sitting around chatting over the sports page, so I wouldn’t recommend that).

sam · September 3, 2018, 9:10am

Running this many unicorn workers on two virtual cpus that already run redis and pg does not make sense.

I would run 3 workers max, and give the leftover ram to postgres.

JohnDoux · September 3, 2018, 9:17am

Thanks guys.

What I don’t understand is that I had 2GB RAM and it was working well, then suddenly I started to have a high load average. I already upgraded to 4GB RAM and I am still facing the same issue, how is it possible that Discourse suddenly needs so much more RAM?

I also changed my app.yml with UNICORN_WORKERS: 3, it was 8 before.

I am rebuilding the app now and I’ll check if there is any changes.

sam · September 3, 2018, 9:18am

let us know how you go with the safer number

mpalmer · September 3, 2018, 9:23am

You probably started serving a whole lot more traffic.

JohnDoux · September 3, 2018, 10:13am

Well actually that’s pretty much the opposite It’s a lot more quiet during summer vacation.

So after 2 hours and around 30 users live:

top - 06:11:09 up 1 day,  1:55,  1 user,  load average: 4.09, 3.24, 2.91
Tasks: 130 total,   7 running, 123 sleeping,   0 stopped,   0 zombie
%Cpu(s): 62.5 us, 26.4 sy,  0.0 ni,  1.4 id,  7.4 wa,  0.0 hi,  2.3 si,  0.0 st
KiB Mem :  4048268 total,   135872 free,  1739644 used,  2172752 buff/cache
KiB Swap:  1048572 total,   975084 free,    73488 used.  1763862 avail Mem

06:12:12 AM  pswpin/s pswpout/s
06:12:13 AM      0.00      0.00
06:12:14 AM      0.00      0.00
06:12:15 AM      0.00      0.00
06:12:16 AM      0.00      0.00
06:12:17 AM      0.00      0.00
06:12:18 AM      0.00      0.00
06:12:19 AM      0.00      0.00
06:12:20 AM      0.00      0.00
06:12:21 AM      0.00      0.00
06:12:22 AM      0.00      0.00
06:12:23 AM      0.00      6.00
06:12:24 AM      0.00      0.00
06:12:25 AM      0.00      0.00
06:12:26 AM      0.00      0.00
06:12:27 AM      0.00      5.00
06:12:28 AM      0.00      0.00
06:12:29 AM      0.00      0.00

It’s still super slow, and I keep having errors 502 on my side when I try answering a message. Some messages can be posted though.

sam · September 3, 2018, 11:03am

How many page views are you getting? Where is the traffic coming from? Look at you nginx logs, maybe your server is under some sort of attack

JohnDoux · September 3, 2018, 1:10pm

I am having around 20 to 30 page views/minute, around 30 users at the same time.

I am checking the nginx log but not sure how to analyse all those stuffs, I checked that script but seems to be deprecated? Analyzing Discourse Performance using NGINX logs

I also checked the Post-Install Maintenance section and did a apt-get install fail2ban, I am not sure if I have to do anything else but it seems to decrease slightly the load average.

pfaffman · September 3, 2018, 3:01pm

I recently had similar problems that went away when I switched to an “optimized” droplet.

JohnDoux · September 3, 2018, 4:23pm

Yeah but it’s 5 times more expensive for the size of the disk I need I unfortunately cannot afford it

pfaffman · September 3, 2018, 10:37pm

My solution was to move backups and uploads to wherever digital ocean Volumes ($10/month for 100GB).

merefield · September 4, 2018, 9:07am

Then shop around . Look at Vultr, Scaleway, Hetzner …

ssvenn · September 4, 2018, 5:43pm

I know this is kinda besides the point now but I want to post in defense of htop that you can enable advanced CPU stats to get the iowait stats and add process state, IO read and write bytes to get a better picture at what processes eat a lot of IO ( linux - htop - show I/O wait percentage - Server Fault )

There’s also a giant list of other optional columns like page faults to play around with in the F2 setup menu

mpalmer · September 4, 2018, 11:47pm

Let me know when you figure out how to use the F2 setup menu in a screenshot.

ssvenn · September 5, 2018, 5:59am

Oof, that 30 second length limit on imgur animations is brutal

(using OBS and keycastr in case that’s interesting)

Topic		Replies	Views
Due to extreme load, this is temporarily being shown to everyone... when it's not really the case Installation server-resources	19	1601	July 21, 2023
Discourse installation has been getting slower and slower and slower Installation server-resources	37	1532	May 15, 2023
Is 1GB RAM enough for both WordPress and Discourse? Installation server-resources	22	4664	June 8, 2024
Best configurations for speeding up standalone discourse Installation server-resources	26	4008	February 14, 2024
Trying to troubleshoot I/O Wait bottleneck Hosting	1	1044	October 30, 2020

Discourse unavailable with high load average

Related topics