Using cloud provider’s standard system monitoring tools, we’re getting alerts about the same time most days because memory utilization was over 80% for 10+ minutes continually. This state continues for maybe 30 to 60 minutes or more (the duration seems to vary by day) starting right around the same time.
Anyone seen something like this that might have an idea where to start looking? Or, alternatively …
Is there any easy way to look at scheduled jobs within the application stack that might be happening at this pattern and a culprit?
Nope, backups were my first idea too. Backups (both Discourse and cloud provider) are only happening weekly. (Which I should probably change but not until I get this ghost tracked down!)
So I took a look at https://discourse.example.org/sidekiq/scheduler and it seems that all the jobs there are on a period rotation and not necessarily at a certain time of day. Does anyone know if that’s right? If so, is that period based on the startup time of the app container?
It’s almost bound to be a big or bunch of big queries. Dashboard re-calc and Top recalc come to mind as two possible hogs.
80% is not all bad though, means you aren’t over-stressing the system (and going into swap), but utilisation is efficient and you are making good use of your (virtual) hardware?