Memory use gradually ramps up after restart

I’m not sure when it started, but at some point in the last few weeks, presumably after a Discourse update, the site started to feel a bit sluggish. We’ re running 3.4.0.beta2-dev.

I noticed that the server instance had almost no free memory, so I rebooted it. After Discourse started, memory use was initially fine (about 1.2 GB), but it started creeping up, and seems likely to soon reach a point where it becomes sluggish again.

The site is not particularly busy (20 to 30 visitors daily), and it’s been fine for many years, until recently.

The server instance has 2 GB of memory, which should be enough according to the requirements I’ve seen (1 GB minimum; 2 GB recommended).

It feels rather like a memory leak to me. Of course, if there is a leak, it may not be Discourse, but Docker or something else. The instance is only used for Discourse.

Any ideas? Is there a way to verify it’s a leak, and identify the leaking process?

Free memory is a very slippery concept - the one sure sign of too little memory is paging activity.

free
or
free -h
will give you a snapshot

vmstat 5 5
is very useful to see how things are going, including paging activity.

2 Likes
              total        used        free
Mem:          1.9Gi       1.5Gi        73Mi
Swap:         2.0Gi        54Mi       1.9Gi
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0  55524 111624  20080 385060    1    3    68    52  965  349  4  2 93  1  0
 0  0  55524 114884  20088 385152    0    0    13     8 1047  352  2  1 96  0  0
 0  0  55524 112428  20088 385160    0    0     0     3  831  319  3  1 95  0  0
 0  0  55524 111616  20096 385164    0    0     0    51  688  278  2  0 97  0  0
 0  0  55524 109884  20104 385168    0    0     0     8 1117  281  2  1 96  0  1

Does anything above seem problematic? I’ve been getting memory use numbers from HTOP, which seem to match FREE.

My main concern is the way memory use keeps growing. I would expect it to get to a certain point and then hover around there, going up and down with site usage. The steady upward trend is disconcerting.

certainly that looks fine at present - no activity in si and so, which is paging, and also very little disk traffic, which is bi and bo.

what linux does is use free memory for disk caching, so it’s not bad to see free memory go low. The output of free shows available RAM, the man page says:

available
Estimation of how much memory is available for starting new applications, without swapping.

In the case of vmstat, the buff and cache columns are memory used for disk cache and may grow to improve I/O performance, but shrink when there’s memory pressure. So, for both free and vmstat, the ‘free’ amount is a pessimistic measure.

1 Like

Okay, thanks. Possibly the sluggishness was unrelated to what appeared to be a low memory situation. I will continue to monitor this.

1 Like

It’s still possible that something is gradually getting bigger.

This is one of my tactics to see what’s going on:

# ps aux|sort -n +5|tail
systemd+    1659  0.0  1.3 904384 54588 ?        S    16:44   0:00 /usr/lib/postgresql/13/bin/postmaster -D /etc/postgresql/13/main
root         830  0.0  1.6 2253324 65208 ?       Ssl  16:44   0:01 /usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock
systemd+    1682  0.0  1.9 904516 78092 ?        Ss   16:44   0:01 postgres: 13/main: checkpointer 
systemd+   18757  0.1  2.1 912368 85644 ?        Ss   18:06   0:00 postgres: 13/main: discourse discourse [local] idle
1000        1688  0.1  6.5 1006548 256428 ?      Sl   16:44   0:10 unicorn master -E production -c config/unicorn.conf.rb
1000        2189  0.1  8.5 5657760 333248 ?      Sl   16:45   0:06 unicorn worker[3] -E production -c config/unicorn.conf.rb
1000        2113  0.1  8.5 5656608 334352 ?      Sl   16:45   0:07 unicorn worker[2] -E production -c config/unicorn.conf.rb
1000        2044  0.4  8.7 6052196 342380 ?      Sl   16:44   0:23 unicorn worker[1] -E production -c config/unicorn.conf.rb
1000        2006  1.7  9.0 5628640 352492 ?      Sl   16:44   1:33 unicorn worker[0] -E production -c config/unicorn.conf.rb
1000        1971  3.1 11.1 6033652 435388 ?      SNl  16:44   2:54 sidekiq 6.5.12 discourse [0 of 5 busy]

(or ps auxc)

2 Likes

If it’s easy to monitor CPU and (disk) I/O activity, I’d recommend watching those, rather than memory use. Especially I/O. If CPU is low and I/O is high, and the forum is slow, that could indicate an impactful shortage of RAM.

A couple of reasons, other than a bug, for a site becoming sluggish: one is gradual increase in users, user activity, database size; the other is Discourse getting ever larger as it develops, adds features, updates software components.

But it is worthwhile to keep an eye on responsiveness, and on whether the current machine is right-sized.

(In passing, I notice Hetzner’s cheapest machine now has 4G RAM, at the same price as the now-unavailable cheapest machine which has 2G RAM. One of my sites is still running on the older 2G size.)

For the record, as I’ve been tracking my main site’s usage, since it was recently migrated and the server is new and freshly rebooted, I’ll include some findings. It’s quite a bit of data - feel free not to study it!

The present state of the machine is

# uptime
 13:55:23 up 4 days, 21:10,  1 user,  load average: 0.07, 0.08, 0.02
# free
               total        used        free      shared  buff/cache   available
Mem:         3905344     1638012       98492      481864     2168840     1595004
Swap:        4194288      252928     3941360

I notice that on login the machine announces
Memory usage: 45%
which reflects most closely the ‘used’ column, not the ‘free’ column.

I have been taking periodic readings from the following commands

   date
   uptime
   free
   ps aux|sort -n +5|tail
   vmstat 5 5

and what I’ve seen is that ‘free’ memory has been traded for ‘buffer’ and ‘cache’ memory, without the processes’ RAM footprint (RSS) increasing. I think this shows why it’s not great to track ‘free’ memory, even if some hosting providers make this easy. I think it also shows, in this case, no memory leak.

Shortly after reboot I see this:

# free
               total        used        free      shared  buff/cache   available
Mem:         3905344     1560508      996400      179712     1348436     1974692
Swap:        4194288           0     4194288

and not long after

# ps aux|sort -n +5|tail
...
1000        1688  0.1  6.5 1006548 256428 ?      Sl   16:44   0:10 unicorn master -E production -c config/unicorn.conf.rb
1000        2189  0.1  8.5 5657760 333248 ?      Sl   16:45   0:06 unicorn worker[3] -E production -c config/unicorn.conf.rb
1000        2113  0.1  8.5 5656608 334352 ?      Sl   16:45   0:07 unicorn worker[2] -E production -c config/unicorn.conf.rb
1000        2044  0.4  8.7 6052196 342380 ?      Sl   16:44   0:23 unicorn worker[1] -E production -c config/unicorn.conf.rb
1000        2006  1.7  9.0 5628640 352492 ?      Sl   16:44   1:33 unicorn worker[0] -E production -c config/unicorn.conf.rb
1000        1971  3.1 11.1 6033652 435388 ?      SNl  16:44   2:54 sidekiq 6.5.12 discourse [0 of 5 busy]
# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
...
 0  0      0 866112 314288 1083816    0    0    32    28  484  621  4  1 95  0  0

You see that sidekiq (435 MByte) and unicorns (330-350 each) are the largest processes.

Over time, the free RAM and then the sidekiq RAM (RSS) usage decreases, presumably due to being paged out, without undue effect - the machine isn’t showing any paging activity. In favour of increased buffer and cache space, I think.

# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
...
 0  0      0 679764 326988 1190840    0    0     0    11  285  396  1  1 98  0  0

Some 14 hours later:

# uptime
 10:12:06 up 17:27,  1 user,  load average: 0.04, 0.02, 0.00
# ps aux|sort -n +5|tail
...
1000        2006  1.2  9.6 5647908 377424 ?      Sl   Sep05  12:42 unicorn worker[0] -E production -c config/unicorn.conf.rb
1000        1971  1.8 11.3 6431988 444184 ?      SNl  Sep05  18:51 sidekiq 6.5.12 discourse [0 of 5 busy]
# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
...
 0  0   2048 199972 342480 1576156    0    0     0    17  361  511  2  2 96  0  0

Later…

# uptime
 19:52:00 up 1 day,  3:07,  1 user,  load average: 0.02, 0.06, 0.01
# ps aux|sort -n +5|tail
...
1000        2006  1.2  9.8 5654308 382944 ?      Sl   Sep05  20:44 unicorn worker[0] -E production -c config/unicorn.conf.rb
1000        1971  1.5 11.1 6431668 436340 ?      SNl  Sep05  25:04 sidekiq 6.5.12 discourse [0 of 5 busy]
# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
...
 0  0   2304 103356 301632 1690136    0    0     0    10  360  511  1  1 98  0  0

Later…

# uptime
 12:13:09 up 1 day, 19:28,  2 users,  load average: 0.05, 0.06, 0.01
# ps aux|sort -n +5|tail
...
1000        2006  1.2  9.1 5654820 358612 ?      Sl   Sep05  31:47 unicorn worker[0] -E production -c config/unicorn.conf.rb
1000        1971  1.3 10.0 6431668 393584 ?      SNl  Sep05  35:08 sidekiq 6.5.12 discourse [0 of 5 busy]
# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
...
 0  0 284416 281596  77904 1908528    0    0     0    38  315  450  1  1 98  0  0

Later

# uptime
 13:26:42 up 2 days, 20:42,  1 user,  load average: 0.20, 0.06, 0.02
# ps aux|sort -n +5|tail
...
1000        2006  1.2  9.3 5789072 365720 ?      Sl   Sep05  51:54 unicorn worker[0] -E production -c config/unicorn.conf.rb
1000        1971  1.2 10.0 6433332 393472 ?      SNl  Sep05  50:44 sidekiq 6.5.12 discourse [0 of 5 busy]
# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
...
 0  0 242944  82016  95188 2082180    0    0     0   131  332  488  1  1 98  0  0

Later

# uptime
 09:21:33 up 3 days, 16:36,  1 user,  load average: 0.13, 0.10, 0.03
# free
               total        used        free      shared  buff/cache   available
Mem:         3905344     1618936      323032      476664     1963376     1619208
Swap:        4194288      250112     3944176
# ps aux|sort -n +5|tail
...
1000        2006  1.2  9.3 5789200 363572 ?      Sl   Sep05  67:02 unicorn worker[0] -E production -c config/unicorn.conf.rb
1000        1971  1.1  9.6 6433652 377472 ?      SNl  Sep05  63:14 sidekiq 6.5.12 discourse [0 of 5 busy]
# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
...
 0  0 250112 321888  56052 1906672    0    0     2    13  293  420  1  0 99  0  0

Later

# uptime
 13:55:23 up 4 days, 21:10,  1 user,  load average: 0.07, 0.08, 0.02
# free
               total        used        free      shared  buff/cache   available
Mem:         3905344     1638012       98492      481864     2168840     1595004
Swap:        4194288      252928     3941360
# ps aux|sort -n +5|tail
...
1000        1971  1.1  9.5 6434676 371648 ?      SNl  Sep05  80:49 sidekiq 6.5.12 discourse [0 of 5 busy]
1000        2006  1.2  9.5 5658468 373404 ?      Sl   Sep05  88:44 unicorn worker[0] -E production -c config/unicorn.conf.rb
# vmstat 5 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
...
 1  0 252928 101040  86736 2082372    0    0     0    10  333  482  1  0 99  0  0

Thanks for sharing your observations. I’m seeing much the same, except that we’re using a 2 GB instance, so there’s less headroom. Also, thanks for pointing out that some measures of ‘free’ and ‘used’ memory are not necessarily helpful.

When I last rebooted the instance a few days ago, the initial memory use was 1.23 GB. Since then, it has gradually ramped up, and is now at 1.8 GB. The site remains reasonably responsive, for now.

The site doesn’t actually have many users, and there has been no recent increase in user registrations or activity. In the past month there have been about 20 new topics, about 100 posts, and about 4 daily engaged users.

I will continue to monitor things, and will post here if either the instance’s memory maxes out again, or the site becomes sluggish again, or both.

1 Like