My site has seen a slight and sudden slowdown when loading pages lately. I had an issue where a backup was generated and exceeded the space on my digital ocean volume and took down the site. Since then I have had a hard time rebuilding the site. These events could be related based on the timing. Currently the site appears to be in a stable state but just slower than what I’m used to.
I could get into the details of happened more but I’d rather ask a more general question. What are some techniques to diagnose the cause of a slowdown? My droplet is averaging 20% CPU utilization so I appear to have sufficient resources (4 GB Memory / 2 AMD vCPUs / 80 GB Disk, ~15k pageviews a day)
Thanks! If you had memory shortage, the cache numbers would be small, and if paging a lot, the si and so columns would be large. But this is not so.
We do see a big peak in bi and bo, which is typically disk activity. I wonder if something somewhere is building or repairing or scanning something.
Perhaps try running ps auxrc
every five seconds for a minute or so, to see if you can catch a busy process in the act.
There are other utilities which might not already be installed: perhaps search for “How to Monitor Disk IO in a Linux System” or similar.
It’s worth noting that if you have doubts about the integrity of your system, rebuilding it from a backup might be the swiftest way forward. But be sure to have an offsite copy of the backup, if not two, in case of accident. And, ideally, do the install on a new instance and keep the existing one around until the new one is working OK.
More interesting is that you have a lot of sidekiq processes and yet I see the annotation “0 of 5 busy” - you have more than 5. You also seem to have a lot of unicorn threads.
I suggest a new topic here, with your htop output, including your yml config as to whether you’ve adjusted your unicorn count. Ask whether this set of processes looks reasonable.
Ah yes, I should have checked my own htop: very similar.
Another very different idea, for the original observation of ‘a slowdown’ - to activate the mini-profiler using Alt-P, then accessing a typical large page on your forum, and seeing what queries are being made and how long they take, by clicking on the timing figure which appears top right.
I was able to do a apt upgrade and also rebuild. This problem: Pups error on rebuild 🐶 was preventing me from rebuilding for a while
Since the rebuild, it feels improved. I don’t like operating by feeling though in this case, I’d rather have analytics and measurable data. I appreciate the tips @Ed_S they will be useful for further monitoring.
I’m wondering if it’s possible to capture some of this profiling data to show the “health” of the instance via the admin page. Perhaps a potential plugin idea or future core feature?