Discourse not using much RAM

kianwalters05 · August 5, 2021, 11:22am

Hi!
I check yesterday and the site was fine, I woke up this morning and its down with the 502 Bad Gateway error. Heres the monitor.log monitor.txt (9.4 MB)

Benjamin_D · August 5, 2021, 11:53am

whoa… why so much java ? looks like my jitsi server once
It also seems you’re using your host as a mail server. I finally separated mine from the app, this way it is easier to restore eventually (and keeping it tidy, as close as possible to a standard prod install). A very small instance for the mail with a floating IP might also be easier to monitor for reputation and marginally expensive.

Ed_S · August 5, 2021, 12:02pm

OK, let’s see… first we grep for average:

 06:24:13 up 3 days,  8:13,  0 users,  load average: 0.01, 0.05, 0.08
 06:34:33 up 3 days,  8:23,  0 users,  load average: 12.89, 11.68, 6.35

So something blew up, in terms of number of runnable processes, a bit before 06:34.

A slightly wider grep and we see that the memory usage hasn’t really shifted:

 06:24:13 up 3 days,  8:13,  0 users,  load average: 0.01, 0.05, 0.08
              total        used        free      shared  buff/cache   available
Mem:        8167548     3513172     3241404      371940     1412972     4512016
Swap:      10485756           0    10485756
--
 06:34:33 up 3 days,  8:23,  0 users,  load average: 12.89, 11.68, 6.35
              total        used        free      shared  buff/cache   available
Mem:        8167548     3110612     1351844      373268     3705092     4812672
Swap:      10485756           0    10485756

In fact if anything memory usage went down, so perhaps something died:

 06:13:53 up 3 days,  8:02,  0 users,  load average: 0.04, 0.16, 0.15
              total        used        free      shared  buff/cache   available
Mem:        8167548     3505540     3259744      371932     1402264     4519680
 06:24:13 up 3 days,  8:13,  0 users,  load average: 0.01, 0.05, 0.08
              total        used        free      shared  buff/cache   available
Mem:        8167548     3513172     3241404      371940     1412972     4512016
 06:34:33 up 3 days,  8:23,  0 users,  load average: 12.89, 11.68, 6.35
              total        used        free      shared  buff/cache   available
Mem:        8167548     3110612     1351844      373268     3705092     4812672
 06:44:53 up 3 days,  8:33,  0 users,  load average: 13.64, 13.25, 9.89
              total        used        free      shared  buff/cache   available
Mem:        8167548     3093212     3618984      373276     1455352     4840140
 06:55:13 up 3 days,  8:44,  0 users,  load average: 11.62, 12.70, 11.42
              total        used        free      shared  buff/cache   available
Mem:        8167548     3140580     3366628      373352     1660340     4792440

Looks like the unicorns restarted:

Thu Aug  5 06:24:13 CEST 2021
 06:24:13 up 3 days,  8:13,  0 users,  load average: 0.01, 0.05, 0.08
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     30389  0.0  0.0   2160  1148 ?        Ss   Aug03   0:00          \_ runsv unicorn
x        13329  0.0  0.0  15132  3712 ?        S    Aug04   0:26              \_ /bin/bash config/unicorn_launcher -E production -c config/unicorn.conf.rb
x        13365  0.0  3.0 542036 251584 ?       Sl   Aug04   0:23                  \_ unicorn master -E production -c config/unicorn.conf.rb
x        13846  0.6  4.5 9129356 371852 ?      SNl  Aug04   4:19                  |   \_ sidekiq 6.2.1 discourse [0 of 5 busy]
x        13858  0.0  3.9 8971616 323264 ?      Sl   Aug04   0:18                  |   \_ unicorn worker[0] -E production -c config/unicorn.conf.rb
x        13870  0.0  4.0 8975712 330660 ?      Sl   Aug04   0:20                  |   \_ unicorn worker[1] -E production -c config/unicorn.conf.rb
x        13889  0.0  4.1 8979808 337180 ?      Sl   Aug04   0:20                  |   \_ unicorn worker[2] -E production -c config/unicorn.conf.rb
x        29760  0.0  0.0  13708  2220 ?        S    06:24   0:00                  \_ sleep 1

Thu Aug  5 06:34:33 CEST 2021
 06:34:33 up 3 days,  8:23,  0 users,  load average: 12.89, 11.68, 6.35
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     30389  0.0  0.0   2160  1148 ?        Ss   Aug03   0:00          \_ runsv unicorn
x         4638  0.4  0.0  15132  3784 ?        S    06:32   0:00              \_ /bin/bash config/unicorn_launcher -E production -c config/unicorn.conf.rb
x         4642 11.1  2.8 439636 233240 ?       Sl   06:32   0:11                  \_ unicorn master -E production -c config/unicorn.conf.rb
x         4822 13.1  3.7 8877408 309844 ?      Sl   06:33   0:10                  |   \_ unicorn worker[1] -E production -c config/unicorn.conf.rb
x         5025 15.6  3.7 8873312 310264 ?      Sl   06:33   0:10                  |   \_ unicorn worker[0] -E production -c config/unicorn.conf.rb
x         5065 20.6  3.9 8922504 320740 ?      SNl  06:33   0:11                  |   \_ sidekiq 6.2.1 discourse [0 of 5 busy]
x         5639  0.0  0.0  13708  2264 ?        S    06:34   0:00                  \_ sleep 1

And it looks like perhaps apache is having trouble starting:

Thu Aug  5 06:24:13 CEST 2021
 06:24:13 up 3 days,  8:13,  0 users,  load average: 0.01, 0.05, 0.08
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     13730  0.0  0.1 225016 15840 ?        Ss   Aug02   0:17 /usr/sbin/apache2 -k start
www-data 10232  0.0  0.1 1352372 10972 ?       Sl   Aug03   0:00  \_ /usr/sbin/apache2 -k start
www-data 27998  0.0  0.1 228880  9608 ?        S    Aug04   0:01  \_ /usr/sbin/apache2 -k start
www-data 28000  0.0  0.3 3036536 27796 ?       Sl   Aug04   0:53  \_ /usr/sbin/apache2 -k start
www-data 28002  0.0  0.3 2904772 26524 ?       Sl   Aug04   0:50  \_ /usr/sbin/apache2 -k start
www-data 28185  0.0  0.1 762472 10952 ?        Sl   Aug04   0:00  \_ /usr/sbin/apache2 -k start

Thu Aug  5 06:34:33 CEST 2021
 06:34:33 up 3 days,  8:23,  0 users,  load average: 12.89, 11.68, 6.35
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     13730  0.0  0.1 225024 15856 ?        Ss   Aug02   0:17 /usr/sbin/apache2 -k start
www-data 10232  0.0  0.1 1352372 10972 ?       Sl   Aug03   0:00  \_ /usr/sbin/apache2 -k start
www-data 28185  0.0  0.1 762472 10952 ?        Sl   Aug04   0:00  \_ /usr/sbin/apache2 -k start
www-data 30794  0.0  0.1 228888  9620 ?        S    06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 31490  0.0  0.1 344564 10852 ?        Sl   06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 32095  0.0  0.1 500228 10888 ?        Sl   06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 32215 83.2  0.1 369152 10876 ?        Sl   06:25   7:26  \_ /usr/sbin/apache2 -k start
www-data 32284  0.0  0.1 229400  9936 ?        S    06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 32382  187  0.1 410136 10916 ?        Sl   06:25  16:31  \_ /usr/sbin/apache2 -k start
www-data 32410  0.0  0.1 229400  9936 ?        S    06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 32545  0.0  0.1 229400  9936 ?        S    06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data  2106  0.0  0.1 3032348 11396 ?       Sl   06:27   0:00  \_ /usr/sbin/apache2 -k start
www-data  2240  0.0  0.1 2901012 11416 ?       Sl   06:27   0:00  \_ /usr/sbin/apache2 -k start
www-data  2241  0.0  0.1 2901184 14220 ?       Sl   06:27   0:00  \_ /usr/sbin/apache2 -k start

Thu Aug  5 06:44:53 CEST 2021
 06:44:53 up 3 days,  8:33,  0 users,  load average: 13.64, 13.25, 9.89
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root     13730  0.0  0.1 225024 15856 ?        Ss   Aug02   0:17 /usr/sbin/apache2 -k start
www-data 10232  0.0  0.1 1352372 10972 ?       Sl   Aug03   0:00  \_ /usr/sbin/apache2 -k start
www-data 28185  0.0  0.1 762472 10952 ?        Sl   Aug04   0:00  \_ /usr/sbin/apache2 -k start
www-data 30794  0.0  0.1 228888  9620 ?        S    06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 31490  0.0  0.1 344564 10852 ?        Sl   06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 32095  0.0  0.1 500228 10888 ?        Sl   06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 32215 82.1  0.1 369152 10876 ?        Sl   06:25  15:50  \_ /usr/sbin/apache2 -k start
www-data 32284  0.0  0.1 229400  9936 ?        S    06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 32382  188  0.1 410136 10916 ?        Sl   06:25  36:10  \_ /usr/sbin/apache2 -k start
www-data 32410  0.0  0.1 229400  9936 ?        S    06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data 32545  0.0  0.1 229400  9936 ?        S    06:25   0:00  \_ /usr/sbin/apache2 -k start
www-data  2106  0.0  0.1 3032348 11396 ?       Sl   06:27   0:00  \_ /usr/sbin/apache2 -k start
www-data  2240  0.0  0.1 2901012 11416 ?       Sl   06:27   0:00  \_ /usr/sbin/apache2 -k start
www-data  2241  0.0  0.1 2901184 14432 ?       Sl   06:27   0:00  \_ /usr/sbin/apache2 -k start

Notice how there are lots of forked apache processes and one of them is burning 83% CPU and another burning 188%. Something is amiss there.

But, I’m not sure what to do about this finding.

Ed_S · August 5, 2021, 12:34pm

Might also be worth seeing if the kernel had reason to kill anything recently:
dmesg -T | egrep -i kille

kianwalters05 · August 5, 2021, 12:41pm

I get nothing from running that command

kianwalters05 · August 5, 2021, 12:46pm

I think its cause I had a friend switch me to MPM Event from MPM prefork. Apache has been using a lot more since then but no sites have crashed or messed up since discourse. I just changed back to MPM prefork. Maybe thatll help?

kianwalters05 · August 5, 2021, 12:47pm

Ive found out the java is apparently Varnish cache. I opened the tree for the process and the base was Varnish which makes a lot more sense than java

thwright · August 5, 2021, 1:50pm

MPM Event allows Apache to run multi threaded, and thereby serving more visitors. It has to be paired with a separate PHP service, now commonly PHP-FPM. Prefork will restrict Apache’s and PHP’s performance if your server is busy. If Event is using too much, it is most likely a poorly configured FPM over Event, but an unoptimized MPM Event could be a problem, too.

Prefork works fine for servers without a lot of simultaneous traffic, but the standard for Apache is Event/FPM nowadays.

Ed_S · August 5, 2021, 2:35pm

I think that’s a good thing…

kianwalters05 · August 5, 2021, 3:48pm

Ok thank you for the help

kianwalters05 · August 8, 2021, 8:30pm

I changed back to MPM prefork and apache hasnt crashed in 3 days. Im hoping this is it and its all fixed. Thanks for all the help

system · September 7, 2021, 8:31pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Discourse Docker HW reserved/used (CPU, RAM, Disk) and how to manage it Installation server-resources	5	796	May 16, 2023
Discourse installation has been getting slower and slower and slower Installation server-resources	37	1527	May 15, 2023
Is 1GB RAM enough for both WordPress and Discourse? Installation server-resources	22	4661	June 8, 2024
Memory is running out and Discourse stops working Bug	74	14151	February 17, 2015
Crashing with out of memory error when opening a topic Support server-resources	7	1010	January 23, 2019

Discourse not using much RAM

Related topics