Memory creep in last couple of updates

RBoy · May 14, 2019, 10:51pm

For the last few weeks I’ve noticed that the system memory usage creeps up every day and maxes it out.

Historically the memory usage has been about 50% - 55% (on a 3GB system). Now after an update it starts out at 50% but then over the next few days it slowly creeps up to 85% and then starts using up the swap.

Is there a way to find what in Discourse is creeping up and taking memory. The task manager only shows Ruby slowly increasing the amount of memory it’s consuming. Each ruby process seems to be taking up 350M and growing. (it starts with under 200M after an update)

Just updated to v2.3.0.beta9 +392 a two days ago, it’s already gone from 50% to 75% and doesn’t seem to stabilizing.

david · May 14, 2019, 10:53pm

Try updating again. We noticed the same issue, and applied a fix a few hours ago. (commit 1, commit 2)

RBoy · May 14, 2019, 11:03pm

Okay updated and it’s restarted with 47%, will keep an eye on it. Thanks for the quick response.

RBoy · May 14, 2019, 11:46pm

It’s already creeped back up to ~~61%~~ 64%, the Ruby processes are all now in the range of 310M-340M, will watch it for a day and report back.

Not sure if it’s related but I’m seeing this every night for the past week or so around 1am in the logs:

Sidekiq is consuming too much memory (using: 502.99M)

david · May 15, 2019, 8:14am

You could try enabling the sidekiq logs, and then look for which job is causing the problem. Some information on those logs can be found in this commit message

https://github.com/discourse/discourse/commit/8963f1af30cd72627e35ce04b317b775435c6f22

RBoy · May 15, 2019, 1:18pm

The memory utilization is back up to 73% and doesnt’ seem to slowing down. It’s now beginning to take up swap space.

I’m not sure how to do this, would need some guidance. I had a look at the commit and it talks about setting 2 environment variables. How do I do this? I’m not familiar with ruby/docker and don’t want mess anything up as this is live.

Is there anything else I can look at to see why the memory utilization is creeping up?

I’m also seeing a new error in the logs after the update (2 since yesterday):

Job exception: post_revision_id

Falco · May 15, 2019, 1:55pm

Did you do a rebuild? Are you on the default branch of tests-passed?

RBoy · May 15, 2019, 2:21pm

Yes and yes I assume, using the default setup (is there a way to select a different branch?)

Stephen · May 15, 2019, 2:38pm

There is, but that’s the right release to be getting any fixes.

RBoy · May 16, 2019, 1:05am

@sam is this commit related to this issue? If so is it stable enough to update?

https://github.com/discourse/discourse/commit/76173dea87115589414a200ad7fc1f08a4deb280

sam · May 16, 2019, 1:07am

The issue itself was fixed days ago, it is stable enough to upgrade.

RBoy · May 16, 2019, 1:34am

Okay updated, I’ll keep an on eye on it, hopefully this will fix it.

I didn’t get what you meant by the issue was fixed days ago. The memory consumption as of this evening is still creeping up.

danekhollas · May 16, 2019, 8:49am

Does this fix require rebuild or can I just upgrade via UI?

sam · May 16, 2019, 8:52am

Via the UI should be fine

RBoy · May 16, 2019, 8:13pm

Okay so I did a update and rebuild last night. The memory usage is back up to 71% and still growing. The only way to reduce it is to restart discourse at which point it drops back down to under 50% and then starts working it’s way up again. The CPU utilization is about 1% on average.

sam · May 16, 2019, 8:15pm

What process is growing ? Sidekiq? Unicorn worker? Redis? PG?

RBoy · May 16, 2019, 8:35pm

That’s a good question, which is exactly what I had earlier, how do I find out what’s taking up memory within Discourse? I can only see the task manager which shows that Ruby is taking up more memory with time (all the instances of Ruby are growing in memory consumption).

sam · May 16, 2019, 8:36pm

As root run ps aux repeat every few hours

RBoy · May 16, 2019, 8:55pm

Okay when it was taking up 71% memory the top 14 consumers (%mem) are:

PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
49458  0.3  8.0 938568 326016 ?       Sl   May15   4:16 unicorn worker[2] -E production -c config/unicorn.conf.rb
49418  0.6  8.0 1041604 324192 ?      SNl  May15   7:19 sidekiq 5.2.7 discourse [0 of 5 busy]
49448  0.3  7.9 938056 321148 ?       Sl   May15   4:22 unicorn worker[1] -E production -c config/unicorn.conf.rb
49504  0.3  7.9 943692 319948 ?       Sl   May15   4:16 unicorn worker[7] -E production -c config/unicorn.conf.rb
49495  0.3  7.9 928328 319480 ?       Sl   May15   4:21 unicorn worker[6] -E production -c config/unicorn.conf.rb
49476  0.3  7.9 933448 318464 ?       Sl   May15   4:20 unicorn worker[4] -E production -c config/unicorn.conf.rb
49486  0.3  7.8 946768 315236 ?       Sl   May15   4:07 unicorn worker[5] -E production -c config/unicorn.conf.rb
49467  0.3  7.8 928840 315108 ?       Sl   May15   4:05 unicorn worker[3] -E production -c config/unicorn.conf.rb
49439  0.3  7.7 928328 313640 ?       Sl   May15   4:14 unicorn worker[0] -E production -c config/unicorn.conf.rb
49317  0.1  4.8 485628 196588 ?       Sl   May15   2:03 unicorn master -E production -c config/unicorn.conf.rb
49311  0.0  2.4 1263836 96848 ?       Ss   May15   0:08 postgres: 10/main: checkpointer process   
49293  0.0  1.3 1263704 54864 ?       S    May15   0:11 /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
1226  0.0  1.2 280508 49016 tty7     Ssl+ May15   0:21 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch

After a restart and a couple of grace minutes, it’s showing 50% and the top memory consumers are

PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
17466 17.2  7.5 913964 304276 ?       Sl   16:47   0:09 unicorn worker[1] -E production -c config/unicorn.conf.rb
17494 18.5  7.5 917036 302308 ?       Sl   16:47   0:09 unicorn worker[4] -E production -c config/unicorn.conf.rb
17475 17.8  7.4 913964 301368 ?       Sl   16:47   0:09 unicorn worker[2] -E production -c config/unicorn.conf.rb
17457 15.7  7.3 909244 297984 ?       Sl   16:47   0:08 unicorn worker[0] -E production -c config/unicorn.conf.rb
17522 19.1  7.3 906168 297556 ?       Sl   16:47   0:09 unicorn worker[7] -E production -c config/unicorn.conf.rb
17484 16.7  7.3 906168 297244 ?       Sl   16:47   0:08 unicorn worker[3] -E production -c config/unicorn.conf.rb
17503 18.6  7.3 899000 294548 ?       Sl   16:47   0:09 unicorn worker[5] -E production -c config/unicorn.conf.rb
17512 18.4  7.2 896952 292200 ?       Sl   16:47   0:09 unicorn worker[6] -E production -c config/unicorn.conf.rb
17303 13.0  4.8 477436 194544 ?       Sl   16:46   0:13 unicorn master -E production -c config/unicorn.conf.rb
17435  0.9  4.5 554280 182640 ?       SNl  16:47   0:00 sidekiq 5.2.7 discourse [0 of 5 busy]
17267  0.0  1.4 1263704 57740 ?       S    16:46   0:00 /usr/lib/postgresql/10/bin/postmaster -D /etc/postgresql/10/main
1226  0.0  1.2 280508 48464 tty7     Ssl+ May15   0:22 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
1447  0.3  1.2 776896 48360 ?        Ssl  May15   5:57 /usr/bin/dockerd -H fd://

Looks like sidekiq, some of the unicorn workers and the postgres.

Let me know if you would like me to collect any other data.

sam · May 16, 2019, 8:58pm

You are running too many unicorns, those numbers look right to me 300-500 per worker is in the normal range

Cut unicorn count down by 3

Topic		Replies	Views
Sidekiq is consuming too much memory, restarting Installation	40	8701	October 13, 2020
Memory is running out and Discourse stops working Bug	74	14154	February 17, 2015
Discourse-docker run with unicorn memory issue Installation server-resources	7	1047	January 22, 2020
Could sidekiq queue be reason for 500 errors? Installation server-resources	31	3797	July 13, 2018
Due to extreme load, this is temporarily being shown to everyone... when it's not really the case Installation server-resources	19	1601	July 21, 2023

Memory creep in last couple of updates

Related topics