After discovering the release of 3.3.0-beta1, I immediately updated my Discourse instance from the web interface.
However, during the update process, the web interface logs stalled for more than fifteen minutes without further output (I recall the last output being a series of growing ellipses? It might have been, I’m a bit uncertain). About 2 hours later, I checked the server status from the cloud platform and suspected it had frozen, so I performed a soft reboot from the cloud platform.
After the reboot, I promptly ran a Discourse backup from the command line, downloaded the backup and app.yml locally, and then completely reinstalled Discourse (of course, the latest version). Afterwards, I uploaded the backup and initiated the restore process from the command line.
The restore was successful, but now my Discourse is facing severe performance issues. Previously, CPU usage during normal usage did not exceed 10%, but now it spikes up to around 30% even during off-peak hours, and disk reads are also relatively high. What’s worse, sometimes the server inexplicably crashes, with disk reads reaching around 1900 per second (this is the limit of my cloud server), and the CPU being over 40% in a wait state. Webpages fail to load, showing connection timeouts. At the moment, I’m running vmstat and top, but unfortunately, I didn’t keep the output. I recall that swap IO was almost zero, indicating purely disk reads. The number of blocked threads exceeded 100.
I suspect that this failed update may have caused some damage, possibly to data within the backup, rather than the software itself. Is there any way to—uh, I’m not sure?—refresh or delete some cache or similar operations? Or perhaps… run the update again? (After all, Discourse updates are quite frequent, and it can be updated almost anytime.)
As a temporary workaround, I installed a software watchdog to automatically reboot during high loads. However, this is ultimately not a long-term solution, and I haven’t found similar issues here; evidently, it’s not a problem with the Discourse software itself. I’m wondering how to address this.
If you need me to execute some commands on the server to check its status during high loads, feel free to ask. I’ll do my best to maintain my SSH connection and obtain this data without rebooting.