Crius
(Crius)
June 16, 2023, 2:39pm
1
Hey there,
this message started popping out nearly constantly since I last updated discourse a couple of days ago.
I wouldn’t have opened a topic here if not for the reason that… it’s not true.
The message appear but you can browse the forum as if you are logged in, nothing is really shown as if you were not.
Checking the resources used/available on the host doesn’t show the machine being overloaded or anything of the sort.
Can someone help me understand how this message get triggered so I can start investigating on what could be causing this warning when it’s not really the case?
Stephen
(Stephen)
June 16, 2023, 3:33pm
2
That message is meaningful, the fact that your system has free resources is more indicative of misconfiguration than misidentification.
How many unicorn_workers
do you have?
Assuming there’s nothing else on the host have you allocated 16 (two per core)?
If you’re using local Postgres what are your db_shared_buffers
?
2 Likes
Crius
(Crius)
June 16, 2023, 4:21pm
3
I left the default settings from the ./launcher
first setup and that was 8 workers.
Same for the db_shared_buffers
: 4096MB
However due to some tests, the reason which you can read here , the workers were reduced to 4. It had no effect so I can at least restore them to 8.
The reason it’s not 2xCore is that this is a VM and those are vCPU really, not real cores.
I’ll monitor the instance for a bit and come back to assign the solution if that’s the case, thanks @Stephen
Stephen
(Stephen)
June 16, 2023, 4:25pm
4
The message relates to the resources you allocate to Discourse, not the resources of the VM/host.
Get db_shared_buffers at 25% of your reserved memory and 2 unicorn workers per CPU. Some fine-tuning may be necessary.
Obviously you need to manage resources outside the VM too if you feel the resource pool isn’t reliable. Nearly everyone running Discourse is doing so on some kind of VPS.
2 Likes
Crius
(Crius)
June 16, 2023, 6:14pm
5
I am not expert in discourse so I simply ran the script that install it as I seems to remember it should set those parameters based on the available memory/cpu.
I’ve restored the workers as they were prior to some testing and will check on it again.
We didn’t have any issue with those settings to be honest but it could also be unrelated or only partially related.
I’ll keep in mind your suggestion of 2xcores for workers and 25% of the reserved memory for the db shared buffers.
When you say the reserved memory you mean the reserved memory by the container running? Because it seems to always be just “as much as the host has available”
Stephen
(Stephen)
June 16, 2023, 6:29pm
6
this is what discourse-setup does:
# db_shared_buffers: 128MB for 1GB, 256MB for 2GB, or 256MB * GB, max 4096MB
# UNICORN_WORKERS: 2 * GB for 2GB or less, or 2 * CPU, max 8
‘max’ can be taken with a pinch of salt there, particularly big communities will see benefits beyond those numbers.
So yes, in your case it should have specified the maximums of 8 workers and 4096MB. If you reduce the workers and shared buffers available to Discourse then it will tap out before all of the resources on the VM are consumed.
This post from @mpalmer is still good guidance:
Increasing the number of unicorn workers to suit your CPU and RAM capacity is perfectly reasonable. The “two unicorns per core” guideline is a starting figure. CPUs differ (wildly) in their performance, and VPSes make that even more complicated (because you can never tell who else is on the box and what they’re doing with the CPU), so you start conservative, and if you find that you’re running out of unicorns before you’re running out of CPU and RAM, then you just keep increasing the unicorns. …
Ed_S
(Ed S)
June 16, 2023, 7:31pm
7
It looks to me like it’s triggered when requests are queued for too long - in other words, requests coming in faster than they are serviced. One might wonder why so many requests, or why such slow service. At the Discourse level, there are tunables which are already discussed in this thread and also, for example, in Extreme load error .
At the Linux level, I would check
uptime
free
vmstat 5 5
ps auxrc
4 Likes
Crius
(Crius)
June 30, 2023, 7:00pm
8
Just updating: upping the workers fixed the issue
2 Likes
Crius
(Crius)
July 12, 2023, 2:20pm
9
Quick follow up.
Since then things seemed fine but it’s a couple days that the forum feel “slow” at times. When I say slow I mean requests takes some time to be processed (submit replies, edit, etc) and just now I noticed the same message again.
I went to check the grafana dashboard I’ve set up and saw that the server is at its limit in terms of CPU usage.
A quick docker stats
show me this:
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
2c81f3b51e74 app 800.14% 18.18GiB / 29.38GiB 61.87% 57.1GB / 180GB 31.1TB / 7.45TB 282
5164921ee233 grafana 0.05% 98.36MiB / 29.38GiB 0.33% 2.05GB / 284MB 7.26GB / 6.17GB 17
400e496902d7 prometheus 0.67% 139.1MiB / 29.38GiB 0.46% 101GB / 3.82GB 28GB / 27.6GB 14
e2af5bfa922f blackbox_exporter 0.00% 13.71MiB / 29.38GiB 0.05% 169MB / 359MB 295MB / 27.4MB 14
581664b0fe9a docker_state_exporter 8.59% 11.86MiB / 29.38GiB 0.04% 533MB / 8.67GB 65.2MB / 6.16MB 15
408e050e9dc9 discourse_forward_proxy 0.00% 5.926MiB / 29.38GiB 0.02% 40.1GB / 40.1GB 36.8MB / 9.68MB 9
fbba6c927dd8 cadvisor 9.13% 385.5MiB / 29.38GiB 1.28% 2.25GB / 135GB 85.1GB / 2.65GB 26
8fe73c0019b1 node_exporter 0.00% 10.74MiB / 29.38GiB 0.04% 112MB / 1.84GB 199MB / 2.82MB 8
9b95fa3156bb matomo_cron 0.00% 4.977MiB / 29.38GiB 0.02% 81.4kB / 0B 49.4GB / 0B 3
553a3e7389eb matomo_web 0.00% 8.082MiB / 29.38GiB 0.03% 2.15GB / 6.36GB 215MB / 2.36GB 9
adf21bdea1e5 matomo_app 0.01% 78.13MiB / 29.38GiB 0.26% 8.63GB / 3.74GB 59.8GB / 3.07GB 4
96d873027990 matomo_db 0.06% 36.8MiB / 29.38GiB 0.12% 3.11GB / 5.76GB 4.16GB / 8.35GB 13
Any idea what could be causing this?
Crius
(Crius)
July 12, 2023, 2:37pm
10
Tried restarting the app, load is still raising to the same amount immediately after the restart.
Is there a way to see what kind of process is using most resources? I tried checking the sidekiq dashboard but it just show me the list of process running/in queue and the average time to execute, some are slow (like taking minutes) but I can’t see nothing in processing right now or failing.
Crius
(Crius)
July 12, 2023, 5:57pm
11
I’m updating everything just to take away any possible issue that was arising due to some problem with beta5. On 3.1.0.beta6 - 6892324767 now.
Still, CPU usage is abnormally high. Usually fluctuate around 60%.
1 Like
Crius
(Crius)
July 13, 2023, 7:40am
13
I already dug a bit into the processes with htop and seems like some unicorn processes are taking a big chunk of cpu for an extended amount of time. Unfortunately I didn’t find a way to get more information about what those unicorn process were working on exactly.
Anyway, here are the output of the suggested commands.
uptime
09:28:59 up 129 days, 17:00, 1 user, load average: 6.43, 7.21, 6.47
free
total used free shared buff/cache available
Mem: 30810676 4448924 972648 4346596 25389104 21623732
Swap: 8388604 2037888 6350716
vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
17 0 2037376 916336 218376 25179252 2 2 1377 368 0 0 13 3 81 1 3
ps auxrc
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
www-data 2748758 0.6 0.0 58528 7820 ? R Jul12 6:37 nginx
crius 2843826 15.9 1.2 1140392 392100 ? Rl Jul12 131:02 ruby
root 3150580 0.1 0.0 0 0 ? R 09:17 0:01 kworker/u16:0-events_unbound
crius 3155489 0.0 0.0 13196 3676 pts/2 R+ 09:29 0:00 ps
ps auxrc
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Mar05 0:08 [kthreadd]
root 3 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [rcu_gp]
root 4 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [rcu_par_gp]
root 5 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [slub_flushwq]
root 6 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [netns]
root 8 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kworker/0:0H-events_highpri]
root 10 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [mm_percpu_wq]
root 11 0.0 0.0 0 0 ? S Mar05 0:00 \_ [rcu_tasks_rude_]
root 12 0.0 0.0 0 0 ? S Mar05 0:00 \_ [rcu_tasks_trace]
root 13 0.0 0.0 0 0 ? S Mar05 18:43 \_ [ksoftirqd/0]
root 14 0.2 0.0 0 0 ? I Mar05 451:57 \_ [rcu_sched]
root 15 0.0 0.0 0 0 ? S Mar05 1:15 \_ [migration/0]
root 16 0.0 0.0 0 0 ? S Mar05 0:00 \_ [idle_inject/0]
root 18 0.0 0.0 0 0 ? S Mar05 0:00 \_ [cpuhp/0]
root 19 0.0 0.0 0 0 ? S Mar05 0:00 \_ [cpuhp/1]
root 20 0.0 0.0 0 0 ? S Mar05 0:00 \_ [idle_inject/1]
root 21 0.0 0.0 0 0 ? S Mar05 1:16 \_ [migration/1]
root 22 0.0 0.0 0 0 ? S Mar05 13:06 \_ [ksoftirqd/1]
root 24 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kworker/1:0H-events_highpri]
root 25 0.0 0.0 0 0 ? S Mar05 0:00 \_ [cpuhp/2]
root 26 0.0 0.0 0 0 ? S Mar05 0:00 \_ [idle_inject/2]
root 27 0.0 0.0 0 0 ? S Mar05 1:17 \_ [migration/2]
root 28 0.0 0.0 0 0 ? S Mar05 11:42 \_ [ksoftirqd/2]
root 30 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kworker/2:0H-events_highpri]
root 31 0.0 0.0 0 0 ? S Mar05 0:00 \_ [cpuhp/3]
root 32 0.0 0.0 0 0 ? S Mar05 0:00 \_ [idle_inject/3]
root 33 0.0 0.0 0 0 ? S Mar05 1:17 \_ [migration/3]
root 34 0.0 0.0 0 0 ? S Mar05 9:55 \_ [ksoftirqd/3]
root 36 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kworker/3:0H-events_highpri]
root 37 0.0 0.0 0 0 ? S Mar05 0:00 \_ [cpuhp/4]
root 38 0.0 0.0 0 0 ? S Mar05 0:00 \_ [idle_inject/4]
root 39 0.0 0.0 0 0 ? S Mar05 1:17 \_ [migration/4]
root 40 0.0 0.0 0 0 ? S Mar05 9:16 \_ [ksoftirqd/4]
root 42 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kworker/4:0H-kblockd]
root 43 0.0 0.0 0 0 ? S Mar05 0:00 \_ [cpuhp/5]
root 44 0.0 0.0 0 0 ? S Mar05 0:00 \_ [idle_inject/5]
root 45 0.0 0.0 0 0 ? S Mar05 1:23 \_ [migration/5]
root 46 0.1 0.0 0 0 ? S Mar05 359:36 \_ [ksoftirqd/5]
root 48 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kworker/5:0H-events_highpri]
root 49 0.0 0.0 0 0 ? S Mar05 0:00 \_ [cpuhp/6]
root 50 0.0 0.0 0 0 ? S Mar05 0:00 \_ [idle_inject/6]
root 51 0.0 0.0 0 0 ? S Mar05 1:18 \_ [migration/6]
root 52 0.0 0.0 0 0 ? S Mar05 9:10 \_ [ksoftirqd/6]
root 54 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kworker/6:0H-events_highpri]
root 55 0.0 0.0 0 0 ? S Mar05 0:00 \_ [cpuhp/7]
root 56 0.0 0.0 0 0 ? S Mar05 0:00 \_ [idle_inject/7]
root 57 0.0 0.0 0 0 ? S Mar05 1:28 \_ [migration/7]
root 58 0.0 0.0 0 0 ? S Mar05 37:46 \_ [ksoftirqd/7]
root 60 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kworker/7:0H-events_highpri]
root 61 0.0 0.0 0 0 ? S Mar05 0:00 \_ [kdevtmpfs]
root 62 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [inet_frag_wq]
root 63 0.0 0.0 0 0 ? S Mar05 0:00 \_ [kauditd]
root 66 0.0 0.0 0 0 ? S Mar05 1:14 \_ [khungtaskd]
root 67 0.0 0.0 0 0 ? S Mar05 0:00 \_ [oom_reaper]
root 68 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [writeback]
root 69 0.1 0.0 0 0 ? S Mar05 246:13 \_ [kcompactd0]
root 70 0.0 0.0 0 0 ? SN Mar05 0:00 \_ [ksmd]
root 71 0.0 0.0 0 0 ? SN Mar05 12:39 \_ [khugepaged]
root 118 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kintegrityd]
root 119 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kblockd]
root 120 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [blkcg_punt_bio]
root 122 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [tpm_dev_wq]
root 123 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [ata_sff]
root 124 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [md]
root 125 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [edac-poller]
root 126 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [devfreq_wq]
root 127 0.0 0.0 0 0 ? S Mar05 0:00 \_ [watchdogd]
root 129 0.0 0.0 0 0 ? I< Mar05 5:55 \_ [kworker/7:1H-kblockd]
root 131 0.2 0.0 0 0 ? S Mar05 429:02 \_ [kswapd0]
root 132 0.0 0.0 0 0 ? S Mar05 0:00 \_ [ecryptfs-kthrea]
root 134 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kthrotld]
root 135 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [acpi_thermal_pm]
root 137 0.0 0.0 0 0 ? S Mar05 0:00 \_ [scsi_eh_0]
root 138 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [scsi_tmf_0]
root 139 0.0 0.0 0 0 ? S Mar05 0:00 \_ [scsi_eh_1]
root 140 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [scsi_tmf_1]
root 142 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [vfio-irqfd-clea]
root 144 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [mld]
root 145 0.0 0.0 0 0 ? R< Mar05 5:51 \_ [kworker/2:1H+kblockd]
root 146 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [ipv6_addrconf]
root 159 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kstrp]
root 162 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [zswap-shrink]
root 163 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [kworker/u17:0]
root 168 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [charger_manager]
root 215 0.0 0.0 0 0 ? S Mar05 0:00 \_ [scsi_eh_2]
root 217 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [scsi_tmf_2]
root 223 0.0 0.0 0 0 ? I< Mar05 4:55 \_ [kworker/3:1H-kblockd]
root 224 0.0 0.0 0 0 ? I< Mar05 5:34 \_ [kworker/1:1H-kblockd]
root 245 0.0 0.0 0 0 ? S Mar05 29:40 \_ [jbd2/sda3-8]
root 246 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [ext4-rsv-conver]
root 275 0.0 0.0 0 0 ? I< Mar05 4:41 \_ [kworker/6:1H-kblockd]
root 277 0.0 0.0 0 0 ? I< Mar05 6:23 \_ [kworker/5:1H-kblockd]
root 308 0.0 0.0 0 0 ? I< Mar05 4:43 \_ [kworker/4:1H-kblockd]
root 326 0.0 0.0 0 0 ? I< Mar05 5:55 \_ [kworker/0:1H-kblockd]
root 364 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [cryptd]
root 395 0.0 0.0 0 0 ? S Mar05 0:00 \_ [jbd2/sda2-8]
root 396 0.0 0.0 0 0 ? I< Mar05 0:00 \_ [ext4-rsv-conver]
root 2698602 0.0 0.0 0 0 ? I Jul12 0:17 \_ [kworker/6:2-mm_percpu_wq]
root 2747634 0.0 0.0 0 0 ? I Jul12 0:06 \_ [kworker/4:0-mm_percpu_wq]
root 2849036 0.0 0.0 0 0 ? I Jul12 0:01 \_ [kworker/7:0-events]
root 2998275 0.0 0.0 0 0 ? I 02:24 0:01 \_ [kworker/5:1-mm_percpu_wq]
root 3028302 0.0 0.0 0 0 ? I 03:47 0:01 \_ [kworker/2:0-cgroup_destroy]
root 3033583 0.0 0.0 0 0 ? I 04:01 0:04 \_ [kworker/7:1-mm_percpu_wq]
root 3050837 0.0 0.0 0 0 ? I 04:48 0:03 \_ [kworker/3:2-events]
root 3092947 0.1 0.0 0 0 ? I 06:44 0:11 \_ [kworker/0:2-events]
root 3093859 0.0 0.0 0 0 ? I 06:46 0:02 \_ [kworker/2:2-events]
root 3102726 0.1 0.0 0 0 ? I 07:10 0:11 \_ [kworker/u16:3-ext4-rsv-conversion]
root 3112088 0.0 0.0 0 0 ? I 07:35 0:02 \_ [kworker/1:0-rcu_gp]
root 3120728 0.0 0.0 0 0 ? I 07:59 0:00 \_ [kworker/6:1]
root 3131601 0.0 0.0 0 0 ? I 08:26 0:00 \_ [kworker/5:0-events]
root 3144423 0.0 0.0 0 0 ? I 09:01 0:00 \_ [kworker/3:0-mm_percpu_wq]
root 3145746 0.0 0.0 0 0 ? I 09:04 0:00 \_ [kworker/4:2]
root 3150580 0.1 0.0 0 0 ? I 09:17 0:01 \_ [kworker/u16:0-writeback]
root 3150760 0.0 0.0 0 0 ? I 09:17 0:00 \_ [kworker/1:2-mm_percpu_wq]
root 3153525 0.0 0.0 0 0 ? I 09:24 0:00 \_ [kworker/0:0]
root 3155513 0.0 0.0 0 0 ? I 09:29 0:00 \_ [kworker/u16:1-events_power_efficient]
root 1 0.0 0.0 167316 7212 ? Ss Mar05 31:01 /sbin/init
root 282 0.0 0.1 80784 31204 ? S<s Mar05 49:18 /lib/systemd/systemd-journald
root 330 0.0 0.0 25224 1728 ? Ss Mar05 1:14 /lib/systemd/systemd-udevd
systemd+ 434 0.0 0.0 89356 236 ? Ssl Mar05 1:09 /lib/systemd/systemd-timesyncd
root 478 0.0 0.0 240256 1492 ? Ssl Mar05 19:27 /usr/libexec/accounts-daemon
root 479 0.0 0.0 9492 676 ? Ss Mar05 1:08 /usr/sbin/cron -f -P
message+ 480 0.0 0.0 8904 1524 ? Ss Mar05 2:27 @dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-a
root 486 0.0 0.0 82768 960 ? Ssl Mar05 10:09 /usr/sbin/irqbalance --foreground
root 487 0.0 0.0 35776 5368 ? Ss Mar05 0:10 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 492 0.0 0.0 15896 2508 ? Ss Mar05 1:40 /lib/systemd/systemd-logind
root 527 0.0 0.0 234484 1460 ? Ssl Mar05 0:05 /usr/libexec/polkitd --no-debug
systemd+ 574 0.0 0.0 16500 2744 ? Ss Mar05 4:20 /lib/systemd/systemd-networkd
systemd+ 576 0.0 0.0 25664 2876 ? Ss Mar05 8:36 /lib/systemd/systemd-resolved
root 580 0.0 0.0 112444 1280 ? Ssl Mar05 0:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdo
root 581 0.2 0.0 1801740 21140 ? Ssl Mar05 556:09 /usr/bin/containerd
root 585 0.0 0.0 8772 88 tty1 Ss+ Mar05 0:00 /sbin/agetty -o -p -- \u --noclear tty1 linux
root 586 0.0 0.0 15424 2504 ? Ss Mar05 23:48 sshd: /usr/sbin/sshd -D [listener] 0 of 10-100 startups
root 3155109 0.0 0.0 17052 10628 ? Ss 09:28 0:00 \_ sshd: crius [priv]
crius 3155148 0.0 0.0 17188 7928 ? S 09:28 0:00 \_ sshd: crius@pts/2
crius 3155149 0.1 0.0 11412 5340 pts/2 Ss 09:28 0:00 \_ -bash
crius 3155643 0.0 0.0 13576 4128 pts/2 R+ 09:30 0:00 \_ ps auxf
root 623 0.4 0.1 2417520 54884 ? Ssl Mar05 840:05 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
root 2890936 0.0 0.0 1157000 844 ? Sl May14 0:22 \_ /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 2053 -co
root 2344276 0.0 0.0 1083012 740 ? Sl Jun12 0:11 \_ /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 9090 -co
root 2374753 0.0 0.0 1083268 908 ? Sl Jun12 0:12 \_ /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 8443 -co
root 2748545 0.0 0.0 1304208 692 ? Sl Jul12 0:00 \_ /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 443 -con
root 2748566 0.0 0.0 1230988 704 ? Sl Jul12 0:00 \_ /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 80 -cont
root 1674 0.0 0.0 298144 2608 ? Ssl Mar05 2:11 /usr/libexec/packagekitd
crius 1442337 0.0 0.0 17396 4392 ? Ss Mar09 14:47 /lib/systemd/systemd --user
crius 1442338 0.0 0.0 169832 300 ? S Mar09 0:00 \_ (sd-pam)
crius 1442416 0.0 0.0 10048 600 ? Ss Mar09 0:00 SCREEN -S ngi_avatar_migration
crius 1442417 0.0 0.0 11284 952 pts/1 Ss+ Mar09 0:00 \_ /bin/bash
crius 2539604 0.0 0.0 730088 5076 ? Sl May13 12:18 /home/crius/.docker/cli-plugins/docker-compose compose up
crius 2543461 0.0 0.0 729832 4092 ? Sl May13 12:41 /home/crius/.docker/cli-plugins/docker-compose compose up
crius 2546257 0.0 0.0 730344 4704 ? Sl May14 12:03 /home/crius/.docker/cli-plugins/docker-compose compose up
root 2890626 0.0 0.0 720756 3528 ? Sl May14 18:49 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 96d87302799096e81906
systemd+ 2890647 0.1 0.1 2391320 39520 ? Ssl May14 117:20 \_ mariadbd --max-allowed-packet=64MB
root 2890739 0.0 0.0 720756 3804 ? Sl May14 25:15 /usr/bin/containerd-shim-runc-v2 -namespace moby -id adf21bdea1e5b2ae1324
root 2890764 0.0 0.0 203348 3348 ? Ss May14 10:22 \_ php-fpm: master process (/usr/local/etc/php-fpm.conf)
82 2723195 0.4 0.1 228180 47224 ? S Jul12 4:24 \_ php-fpm: pool www
82 2915709 0.2 0.1 219164 39312 ? S Jul12 1:32 \_ php-fpm: pool www
82 2915722 0.2 0.1 224888 44924 ? S Jul12 1:27 \_ php-fpm: pool www
root 2890959 0.0 0.0 720756 2524 ? Sl May14 18:43 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 553a3e7389eb5c8433ef
root 2890983 0.0 0.0 8008 384 ? Ss May14 0:00 \_ nginx: master process nginx -g daemon off;
systemd+ 2891099 0.0 0.0 9008 2736 ? S May14 8:04 \_ nginx: worker process
systemd+ 2891100 0.0 0.0 9128 2980 ? S May14 8:05 \_ nginx: worker process
systemd+ 2891101 0.0 0.0 9064 2816 ? S May14 7:59 \_ nginx: worker process
systemd+ 2891102 0.0 0.0 8952 2912 ? S May14 8:05 \_ nginx: worker process
systemd+ 2891103 0.0 0.0 9016 2672 ? S May14 8:00 \_ nginx: worker process
systemd+ 2891104 0.0 0.0 9176 2820 ? S May14 8:00 \_ nginx: worker process
systemd+ 2891105 0.0 0.0 9104 2728 ? S May14 8:00 \_ nginx: worker process
systemd+ 2891106 0.0 0.0 9064 2772 ? S May14 8:03 \_ nginx: worker process
root 2913104 0.0 0.0 721012 4236 ? Sl May14 19:44 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 9b95fa3156bba900a592
root 2913124 0.0 0.0 3896 224 ? Ss May14 0:00 \_ bash -c bash -s <<EOF trap "break;exit" SIGHUP SIGINT SIGTERM while /
root 2913193 0.0 0.0 3896 720 ? S May14 0:04 \_ bash -s
root 3145448 0.0 0.0 2392 428 ? S 09:03 0:00 \_ sleep 3600
root 727413 0.0 0.0 720756 3328 ? Sl May18 18:14 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 581664b0fe9ab6478cb9
root 727452 1.0 0.0 716432 15932 ? Ssl May18 866:43 \_ /go/bin/docker_state_exporter -listen-address=:8080
root 727445 0.0 0.0 720500 4024 ? Sl May18 18:31 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 8fe73c0019b1e003d040
nobody 727472 0.0 0.0 727104 12732 ? Ssl May18 58:50 \_ /bin/node_exporter --path.rootfs=/host
root 727504 0.0 0.0 720756 5208 ? Sl May18 46:22 /usr/bin/containerd-shim-runc-v2 -namespace moby -id fbba6c927dd83c1a7960
root 727564 10.3 0.2 215116 72624 ? Ssl May18 8372:45 \_ /usr/bin/cadvisor -logtostderr
root 727546 0.0 0.0 720756 2844 ? Sl May18 19:57 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 408e050e9dc9b779f439
root 727574 0.0 0.0 9072 532 ? Ss May18 0:00 \_ nginx: master process nginx -g daemon off;
systemd+ 727820 0.0 0.0 9604 2800 ? S May18 1:28 \_ nginx: worker process
systemd+ 727821 0.0 0.0 9604 2768 ? S May18 1:29 \_ nginx: worker process
systemd+ 727823 0.0 0.0 9604 2844 ? S May18 1:24 \_ nginx: worker process
systemd+ 727824 0.0 0.0 9604 2800 ? S May18 1:21 \_ nginx: worker process
systemd+ 727825 0.0 0.0 9604 2784 ? S May18 1:20 \_ nginx: worker process
systemd+ 727826 0.0 0.0 9604 2576 ? S May18 1:16 \_ nginx: worker process
systemd+ 727827 0.0 0.0 9604 2828 ? S May18 1:16 \_ nginx: worker process
systemd+ 727828 0.0 0.0 9604 2820 ? S May18 1:17 \_ nginx: worker process
root 3271398 0.0 0.0 720756 3844 ? Sl May22 16:10 /usr/bin/containerd-shim-runc-v2 -namespace moby -id e2af5bfa922f8b3513bb
root 3271418 0.0 0.0 727512 14900 ? Ssl May22 40:12 \_ /bin/blackbox_exporter --config.file=/config/blackbox.yaml
syslog 3680620 0.0 0.0 222404 2844 ? Ssl May23 6:29 /usr/sbin/rsyslogd -n -iNONE
root 3683957 0.0 0.0 41328 2772 ? Ss May23 0:43 /usr/lib/postfix/sbin/master -w
postfix 2748725 0.0 0.0 41592 5228 ? S Jul12 0:00 \_ qmgr -l -t unix -u
postfix 3146384 0.0 0.0 41548 6652 ? S 09:05 0:00 \_ pickup -l -t unix -u -c
root 2344320 0.0 0.0 720500 4204 ? Sl Jun12 11:48 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 400e496902d765603a25
nobody 2344347 1.1 0.5 2905480 155696 ? Rsl Jun12 502:21 \_ /bin/prometheus --config.file=/app.cfg/prometheus.yaml --storage.tsdb
root 2374785 0.0 0.0 720500 2976 ? Sl Jun12 12:15 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 5164921ee23304d40f4b
472 2374814 0.6 0.3 883336 93316 ? Ssl Jun12 305:17 \_ grafana server --homepath=/usr/share/grafana --config=/etc/grafana/gr
root 2748589 0.0 0.0 720500 3308 ? Sl Jul12 0:24 /usr/bin/containerd-shim-runc-v2 -namespace moby -id 2c81f3b51e743cb2a695
root 2748621 0.0 0.0 6964 332 ? Ss+ Jul12 0:00 \_ /bin/bash /sbin/boot
root 2748731 0.0 0.0 2400 8 ? S+ Jul12 0:02 \_ /usr/bin/runsvdir -P /etc/service
root 2748732 0.0 0.0 2248 208 ? Ss Jul12 0:00 | \_ runsv cron
root 2748738 0.0 0.0 6680 916 ? S Jul12 0:01 | | \_ cron -f
root 2748733 0.0 0.0 2248 16 ? Ss Jul12 0:00 | \_ runsv rsyslog
root 2748741 0.0 0.0 151124 200 ? Sl Jul12 0:00 | | \_ rsyslogd -n
root 2748734 0.0 0.0 2248 108 ? Ss Jul12 0:00 | \_ runsv postgres
root 2748740 0.0 0.0 2396 152 ? S Jul12 0:06 | | \_ svlogd /var/log/postgres
systemd+ 2748751 0.3 0.3 4387564 97536 ? S Jul12 3:50 | | \_ /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/
systemd+ 2748817 0.3 13.8 4389228 4261892 ? Ss Jul12 3:27 | | \_ postgres: 13/main: checkpointer
systemd+ 2748818 0.3 13.8 4387696 4254112 ? Ss Jul12 3:11 | | \_ postgres: 13/main: background writer
systemd+ 2748819 0.3 0.0 4387564 18768 ? Ss Jul12 3:59 | | \_ postgres: 13/main: walwriter
systemd+ 2748820 0.0 0.0 4388108 4804 ? Ss Jul12 0:02 | | \_ postgres: 13/main: autovacuum launcher
systemd+ 2748821 0.5 0.0 67860 2520 ? Ss Jul12 6:01 | | \_ postgres: 13/main: stats collector
systemd+ 2748822 0.0 0.0 4388120 3328 ? Ss Jul12 0:00 | | \_ postgres: 13/main: logical replication launcher
systemd+ 2843545 0.0 0.3 4399008 109852 ? Ss Jul12 0:02 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 2843992 0.0 3.0 4405064 928824 ? Ss Jul12 0:03 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 2852883 0.0 0.8 4396836 272064 ? Ss Jul12 0:04 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 2852884 0.0 7.5 4405456 2335560 ? Ss Jul12 0:08 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 2852887 0.0 7.1 4405764 2189764 ? Ss Jul12 0:11 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 2899553 0.0 2.2 4397092 691820 ? Ss Jul12 0:13 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3037776 3.3 14.1 4526944 4346572 ? Ss 04:12 10:34 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3064125 0.0 0.9 4395760 306864 ? Ss 05:25 0:01 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3114121 2.0 13.0 4519752 4027320 ? Ss 07:41 2:11 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3116277 2.7 13.0 4519504 4030420 ? Ss 07:47 2:48 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3116411 3.9 13.5 4517844 4173364 ? Ss 07:47 4:04 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3117196 1.4 12.9 4515768 3982940 ? Ss 07:49 1:28 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3117708 2.1 13.1 4517008 4057736 ? Ss 07:50 2:09 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3118017 4.8 13.8 4526084 4260840 ? Ss 07:51 4:45 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3118048 3.2 13.3 4525892 4111704 ? Ss 07:51 3:14 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3128686 0.7 6.1 4405988 1909256 ? Ss 08:19 0:30 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3128687 0.7 6.6 4398936 2052828 ? Ss 08:19 0:30 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3130657 0.9 6.8 4405580 2113692 ? Ss 08:24 0:37 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3134864 1.2 6.1 4405228 1908596 ? Ss 08:35 0:40 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3134893 0.8 4.5 4405496 1389992 ? Ss 08:35 0:29 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3144272 1.4 4.5 4405884 1412496 ? Ss 09:00 0:25 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3144691 1.0 8.2 4406004 2554024 ? Ss 09:01 0:17 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3147010 1.3 5.2 4405288 1619456 ? Ss 09:07 0:19 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3154498 0.4 0.9 4398792 304316 ? Ss 09:27 0:00 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3155429 0.3 0.2 4390740 72012 ? Ss 09:29 0:00 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3155430 0.5 0.6 4390000 199852 ? Ss 09:29 0:00 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3155431 0.1 0.3 4389928 104328 ? Ss 09:29 0:00 | | \_ postgres: 13/main: discourse discourse [local] idle
systemd+ 3155435 0.2 0.2 4389760 62116 ? Ss 09:29 0:00 | | \_ postgres: 13/main: discourse discourse [local] idle
root 2748735 0.0 0.0 2248 4 ? Ss Jul12 0:00 | \_ runsv unicorn
crius 2748742 0.1 0.0 15316 2496 ? S Jul12 2:01 | | \_ /bin/bash config/unicorn_launcher -E production -c config
crius 3155642 0.0 0.0 13820 2164 ? S 09:30 0:00 | | \_ sleep 1
root 2748736 0.0 0.0 2248 124 ? Ss Jul12 0:00 | \_ runsv nginx
root 2748739 0.0 0.0 53344 604 ? S Jul12 0:00 | | \_ nginx: master process /usr/sbin/nginx
www-data 2748754 0.6 0.0 58380 8120 ? S Jul12 6:57 | | \_ nginx: worker process
www-data 2748755 0.6 0.0 58376 8664 ? S Jul12 6:55 | | \_ nginx: worker process
www-data 2748756 0.6 0.0 58356 8136 ? S Jul12 6:51 | | \_ nginx: worker process
www-data 2748757 0.6 0.0 59308 8080 ? S Jul12 6:41 | | \_ nginx: worker process
www-data 2748758 0.6 0.0 58528 7820 ? S Jul12 6:37 | | \_ nginx: worker process
www-data 2748759 0.6 0.0 58272 8208 ? S Jul12 6:51 | | \_ nginx: worker process
www-data 2748760 0.6 0.0 59100 8600 ? S Jul12 6:47 | | \_ nginx: worker process
www-data 2748761 0.6 0.0 58532 8188 ? S Jul12 6:50 | | \_ nginx: worker process
www-data 2748762 0.0 0.0 53540 588 ? S Jul12 0:01 | | \_ nginx: cache manager process
root 2748737 0.0 0.0 2248 44 ? Ss Jul12 0:00 | \_ runsv redis
root 2748743 0.0 0.0 2396 16 ? S Jul12 0:00 | \_ svlogd /var/log/redis
message+ 2748750 4.1 0.2 293132 68876 ? Sl Jul12 42:44 | \_ /usr/bin/redis-server *:6379
crius 2843290 0.1 1.0 696992 310896 ? Sl Jul12 1:27 \_ unicorn master -E production -c config/unicorn.conf.rb
crius 2843716 3.6 1.1 1481356 368076 ? SNl Jul12 29:41 | \_ sidekiq 6.5.9 discourse [0 of 5 busy]
crius 2843729 1.7 0.7 1019336 232084 ? Sl Jul12 14:44 | \_ discourse prometheus-global-reporter
crius 2843769 27.3 1.3 1227948 417356 ? Rl Jul12 224:30 | \_ unicorn worker[0] -E production -c config/unicorn.conf.rb
crius 2843826 15.9 1.2 1140968 397560 ? Rl Jul12 131:12 | \_ unicorn worker[2] -E production -c config/unicorn.conf.rb
crius 2843891 9.5 1.1 1114024 353860 ? Rl Jul12 78:12 | \_ unicorn worker[4] -E production -c config/unicorn.conf.rb
crius 2843918 7.1 1.2 1142248 383800 ? Rl Jul12 58:40 | \_ unicorn worker[5] -E production -c config/unicorn.conf.rb
crius 2843965 5.7 1.0 1074792 336012 ? Rl Jul12 47:22 | \_ unicorn worker[6] -E production -c config/unicorn.conf.rb
crius 2844114 1.3 0.1 297232 39440 ? Sl Jul12 11:25 | \_ discourse prometheus-collector
crius 2844577 4.8 1.1 1121000 365044 ? Rl Jul12 40:05 | \_ unicorn worker[7] -E production -c config/unicorn.conf.rb
crius 2844866 4.4 1.0 1129704 328864 ? Rl Jul12 36:14 | \_ unicorn worker[1] -E production -c config/unicorn.conf.rb
crius 2902401 3.2 1.1 1214056 340712 ? Rl Jul12 22:10 | \_ unicorn worker[3] -E production -c config/unicorn.conf.rb
root 3155591 0.1 0.0 2388 76 ? Ss 09:30 0:00 \_ /usr/sbin/anacron -s
Crius
(Crius)
July 13, 2023, 8:18am
14
Restarted the VPS as well, just in case there was something weird going on (doubt as it started suddenly yesterday after 150 days of running) but nope, same behaviour.
It’s some unicorn process chugging all CPU resources. Is there any way to get more info about what those unicorns are doing back there?
Crius
(Crius)
July 13, 2023, 8:34am
15
For a quick lookie, of course it oscillate but this is the average for unicorn workers running, which seems abnormal to me when going on since the day before:
Falco
(Falco)
July 13, 2023, 12:57pm
16
Your load average > 8 in a 8 vCPU machine means that your server is overwhelmed.
Between 8 unicorns, many PostgreSQL pids, and everything else in your server it can’t process incoming requests fast enough. At least you have plenty of memory
It’s kinda unusual for Discourse to be bottlenecked in unicorn CPU like that. Most of times I’ve seen this happened it was because of a misbehaving plugin. Can you share your app.yml?
Also please share the MiniProfiler results of loading both your home page and a topic page.
4 Likes
Crius
(Crius)
July 13, 2023, 4:38pm
17
I’ve asked the hosting to investigate if we are being stealed CPU time by other VPS in our host as it’s really weird that it happened so suddenly without nothing really changing on our side.
I’m waiting for the tech support to come back with some results.
1 Like
Falco
(Falco)
July 13, 2023, 5:02pm
18
Looks like this screenshot cuts the columns labels, but if one of the first 3 are an average you nailed the problem.
Ed_S
(Ed S)
July 13, 2023, 5:56pm
19
Thanks for the outputs. For reference, he first line of stats from vmstat isn’t so useful - all five lines give the picture that’s needed.
Crius
(Crius)
July 21, 2023, 7:57am
20
Sorry I forgot to update here.
It was as I guessed in the end. Another VPS was deployed on the host and was draining it.
We have been moved to another host less than 24h after the opening of the ticket.
Bravo to Contabo
I would ask the mods to leave these last replies here even if it’s not “discourse” related as it can help others figure out other reasons for which their instance might have problems.
6 Likes