I have been running an instance of Discourse for the last 2 years with Digital Ocean without any problem but since the last 2 weeks, it became very slow or unavailable most of the time.
I have upgraded my hosting to the following one, it’s a bit better but still facing the same problems:
Given the 1m load average is nearly half what it was in the previous screenshot, and %iowait is still pretty high, I’d say you’re probably I/O constrained, likely due to swapping. Confirm with sar -W 1 whilst the system is at peak unhappiness, and then upgrade RAM.
Here is the output of the sar -W 1 (I have no idea what it is )
04:52:49 AM pswpin/s pswpout/s
04:52:50 AM 95.00 2.00
04:52:51 AM 166.00 0.00
04:52:52 AM 40.00 0.00
04:52:53 AM 19.00 0.00
04:52:54 AM 74.00 0.00
04:52:55 AM 125.00 0.00
04:52:56 AM 247.00 0.00
04:52:57 AM 215.84 2.97
04:52:58 AM 70.00 0.00
04:52:59 AM 334.00 0.00
04:53:00 AM 390.00 0.00
04:53:01 AM 568.00 6.00
04:53:02 AM 702.00 0.00
04:53:03 AM 1047.52 5.94
04:53:04 AM 416.00 0.00
04:53:05 AM 449.00 0.00
04:53:06 AM 691.00 0.00
04:53:07 AM 772.00 6.00
04:53:08 AM 550.00 0.00
04:53:09 AM 181.00 0.00
04:53:10 AM 476.00 0.00
04:53:11 AM 348.00 0.00
04:53:12 AM 316.00 4.00
04:53:13 AM 454.00 0.00
04:53:14 AM 356.00 0.00
04:53:15 AM 911.88 7.92
04:53:16 AM 262.00 0.00
04:53:17 AM 303.00 0.00
04:53:18 AM 271.00 0.00
04:53:19 AM 284.00 6.00
The headings should give you a hint… “pages swapped in per second” and “pages swapped out per second”. Each page is (usually) 4k, so swapping in 1,000 pages in a second means you’re copying about 4MB of data off disk and into RAM. That’s gonna take a while.
You’re definitely due for a RAM upgrade, or doing something to reduce RAM usage (although your unicorns aren’t exactly sitting around chatting over the sports page, so I wouldn’t recommend that).
What I don’t understand is that I had 2GB RAM and it was working well, then suddenly I started to have a high load average. I already upgraded to 4GB RAM and I am still facing the same issue, how is it possible that Discourse suddenly needs so much more RAM?
I also changed my app.yml with UNICORN_WORKERS: 3, it was 8 before.
I am rebuilding the app now and I’ll check if there is any changes.
06:12:12 AM pswpin/s pswpout/s
06:12:13 AM 0.00 0.00
06:12:14 AM 0.00 0.00
06:12:15 AM 0.00 0.00
06:12:16 AM 0.00 0.00
06:12:17 AM 0.00 0.00
06:12:18 AM 0.00 0.00
06:12:19 AM 0.00 0.00
06:12:20 AM 0.00 0.00
06:12:21 AM 0.00 0.00
06:12:22 AM 0.00 0.00
06:12:23 AM 0.00 6.00
06:12:24 AM 0.00 0.00
06:12:25 AM 0.00 0.00
06:12:26 AM 0.00 0.00
06:12:27 AM 0.00 5.00
06:12:28 AM 0.00 0.00
06:12:29 AM 0.00 0.00
It’s still super slow, and I keep having errors 502 on my side when I try answering a message. Some messages can be posted though.
I also checked the Post-Install Maintenance section and did a apt-get install fail2ban, I am not sure if I have to do anything else but it seems to decrease slightly the load average.
I know this is kinda besides the point now but I want to post in defense of htop that you can enable advanced CPU stats to get the iowait stats and add process state, IO read and write bytes to get a better picture at what processes eat a lot of IO ( linux - htop - show I/O wait percentage - Server Fault )