Memory creep in last couple of updates

(Stephen) #9

There is, but that’s the right release to be getting any fixes.

3 Likes
(RBoy) #10

@sam is this commit related to this issue? If so is it stable enough to update?

2 Likes
(Sam Saffron) #11

The issue itself was fixed days ago, it is stable enough to upgrade.

4 Likes
(RBoy) #12

Okay updated, I’ll keep an on eye on it, hopefully this will fix it.

I didn’t get what you meant by the issue was fixed days ago. The memory consumption as of this evening is still creeping up.

2 Likes
(Daniel Hollas) #13

Does this fix require rebuild or can I just upgrade via UI?

(Sam Saffron) #14

Via the UI should be fine

2 Likes
(RBoy) #15

Okay so I did a update and rebuild last night. The memory usage is back up to 71% and still growing. The only way to reduce it is to restart discourse at which point it drops back down to under 50% and then starts working it’s way up again. The CPU utilization is about 1% on average.

(Sam Saffron) #16

What process is growing ? Sidekiq? Unicorn worker? Redis? PG?

3 Likes
(RBoy) #17

That’s a good question, which is exactly what I had earlier, how do I find out what’s taking up memory within Discourse? I can only see the task manager which shows that Ruby is taking up more memory with time (all the instances of Ruby are growing in memory consumption).

(Sam Saffron) #18

As root run ps aux repeat every few hours

3 Likes
(RBoy) #19

Okay when it was taking up 71% memory the top 14 consumers (%mem) are:

PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
49458  0.3  8.0 938568 326016 ?       Sl   May15   4:16 unicorn worker[2] -E production -c config/unicorn.conf.rb
49418  0.6  8.0 1041604 324192 ?      SNl  May15   7:19 sidekiq 5.2.7 discourse [0 of 5 busy]
49448  0.3  7.9 938056 321148 ?       Sl   May15   4:22 unicorn worker[1] -E production -c config/unicorn.conf.rb
49504  0.3  7.9 943692 319948 ?       Sl   May15   4:16 unicorn worker[7] -E production -c config/unicorn.conf.rb
49495  0.3  7.9 928328 319480 ?       Sl   May15   4:21 unicorn worker[6] -E production -c config/unicorn.conf.rb
49476  0.3  7.9 933448 318464 ?       Sl   May15   4:20 unicorn worker[4] -E production -c config/unicorn.conf.rb
49486  0.3  7.8 946768 315236 ?       Sl   May15   4:07 unicorn worker[5] -E production -c config/unicorn.conf.rb
49467  0.3  7.8 928840 315108 ?       Sl   May15   4:05 unicorn worker[3] -E production -c config/unicorn.conf.rb
49439  0.3  7.7 928328 313640 ?       Sl   May15   4:14 unicorn worker[0] -E production -c config/unicorn.conf.rb
49317  0.1  4.8 485628 196588 ?       Sl   May15   2:03 unicorn master -E production -c config/unicorn.conf.rb
49311  0.0  2.4 1263836 96848 ?       Ss   May15   0:08 postgres: 10/main: checkpointer process   
49293  0.0  1.3 1263704 54864 ?       S    May15   0:11 /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
1226  0.0  1.2 280508 49016 tty7     Ssl+ May15   0:21 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch

After a restart and a couple of grace minutes, it’s showing 50% and the top memory consumers are

PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
17466 17.2  7.5 913964 304276 ?       Sl   16:47   0:09 unicorn worker[1] -E production -c config/unicorn.conf.rb
17494 18.5  7.5 917036 302308 ?       Sl   16:47   0:09 unicorn worker[4] -E production -c config/unicorn.conf.rb
17475 17.8  7.4 913964 301368 ?       Sl   16:47   0:09 unicorn worker[2] -E production -c config/unicorn.conf.rb
17457 15.7  7.3 909244 297984 ?       Sl   16:47   0:08 unicorn worker[0] -E production -c config/unicorn.conf.rb
17522 19.1  7.3 906168 297556 ?       Sl   16:47   0:09 unicorn worker[7] -E production -c config/unicorn.conf.rb
17484 16.7  7.3 906168 297244 ?       Sl   16:47   0:08 unicorn worker[3] -E production -c config/unicorn.conf.rb
17503 18.6  7.3 899000 294548 ?       Sl   16:47   0:09 unicorn worker[5] -E production -c config/unicorn.conf.rb
17512 18.4  7.2 896952 292200 ?       Sl   16:47   0:09 unicorn worker[6] -E production -c config/unicorn.conf.rb
17303 13.0  4.8 477436 194544 ?       Sl   16:46   0:13 unicorn master -E production -c config/unicorn.conf.rb
17435  0.9  4.5 554280 182640 ?       SNl  16:47   0:00 sidekiq 5.2.7 discourse [0 of 5 busy]
17267  0.0  1.4 1263704 57740 ?       S    16:46   0:00 /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
1226  0.0  1.2 280508 48464 tty7     Ssl+ May15   0:22 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
1447  0.3  1.2 776896 48360 ?        Ssl  May15   5:57 /usr/bin/dockerd -H fd://

Looks like sidekiq, some of the unicorn workers and the postgres.

Let me know if you would like me to collect any other data.

(Sam Saffron) #20

You are running too many unicorns, those numbers look right to me 300-500 per worker is in the normal range

Cut unicorn count down by 3

4 Likes
(RBoy) #22

Thanks

This is a default installation, there are no customizations other than adding 3 official plugins to it. So part of the question is, did the change a recent update which is why it’s beginning to creep up now?

What about sidekiq, is it normal for it to almost double memory consumption over time?

I’ve seen this question asked in a couple of places but didn’t find the answer? Is this the UNICORN_WORKERS in app.yml
I see it’s currently set to 8 and I’m guessing from what I read on the forum this number is set when it rebuilds based on the number on CPU, which in this case is 4, so I’m struggling to understand why the memory consumption started increasing lately if there’s been no changes to the hardware/setup?

(RBoy) #23

Okay an update, after the last fork unicorn patch, the utilization seems to have stabilized around 73% (higher than before but atleast not going up past 85%)

I’m reducing the number of unicorns from 8 to 5 as suggested and will let you know how it goes.

I’m still not clear as to why SideKiq is increasing its memory consumption and if something can/should be done about it or if it should be ignored.

Thanks for your help

2 Likes
(Sam Saffron) #24

How high is sidekiq? We have protection in place in case it goes above 500

2 Likes
(RBoy) #25

At last check ps aux reported this (when utilization was 71%)

49418  0.6  8.0 1041604 324192 ?      SNl  May15   7:19 sidekiq 5.2.7 discourse [0 of 5 busy]

However in the error logs I did see sidekiq being restarted every night around midnight with this error:

Sidekiq is consuming too much memory (using: 502.99M)

I’ll continue to keep an eye on it

1 Like
(Jeff Atwood) #26

@sam don’t we have sidekiq logging now for better tracking this courtesy of @david?

2 Likes
(Sam Saffron) #27

We have better logging in that we know what jobs ran, but finding memory leaks is still a very involved process. I will do an internal review to see if we are seeing this on any of our sites.

5 Likes
(RBoy) #28

Just an update here, after reducing the number of unicorn workers from 8 to 5 as suggested the memory utilization has now stabilized at 57% and this is the output of ps aux --sort -%mem

PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
8555  0.6  8.4 1114316 341412 ?      SNl  May17  24:03 sidekiq 5.2.7 
8595  0.4  8.3 958024 334536 ?       Sl   May17  16:10 unicorn worker
8614  0.4  8.1 950856 329196 ?       Sl   May17  16:01 unicorn worker
8604  0.4  8.1 955980 328432 ?       Sl   May17  15:56 unicorn worker
8586  0.4  8.0 958544 323008 ?       Sl   May17  15:58 unicorn worker
8577  0.4  7.9 1072200 321624 ?      Sl   May17  16:05 unicorn worker
8446  0.1  4.9 481532 197740 ?       Sl   May17   6:03 unicorn master

@sam a follow up clarification on the impact of reducing the number of unicorn workers. I was reading on the forum that each worker supports upto 5 jobs, so with 5 workers that’s 25 jobs. I read somewhere else that each worker is good for about 400 connections. I’m not really clear on what this means for system scalability, I suspect it’ll be fine but it would nice if you could outline what the unicorn workers are used for and how many concurrent users (rough ballpark) could the system support with 5 workers. Thanks in advance.

1 Like
(Sam Saffron) #29

Sidekiq is the thing that does the jobs.

Unicorn is for web requests. Running 5 means you can handle 5 “slow” web requests concurrently. Certain web requests like “message bus” and “avatar caching” and “uploads” run in background threads, so the real number of concurrent requests tends to be a lot higher.

“concurrent users” highly depends on what the users are doing and logged in vs anon which is heavily cached. I can’t really provide a particular guideline, but I can provide instrumentation that will tell you if there is a problem.

5 Likes