Discourse Docker HW reserved/used (CPU, RAM, Disk) and how to manage it

I used this weekend to install some external tools on the VM that runs our Discourse installation (grafana, prometheus, some exporters) as well as some analytics (matomo) and setting up the backups on an external location (AWS S3). It’s been two months since our migration and I finally got some free time to do it.

After a couple days left to have metrics and stuff to simmer, I went back to check on it and noticed that the container running discourse is soaking up basically all the RAM available according to cadvisor:

Which is weird to me because checking from other sources it doesn’t seem so, however, I noticed that other aspects still have some things I’d like to understand better.

CPU usage for example have spikes but tend to stay on average well over 100% nearly at all times:

However this is the output of Docker Stats:

CONTAINER ID   NAME                    CPU %     MEM USAGE / LIMIT     MEM %     NET I/O           BLOCK I/O        PIDS
254d80933447   grafana                 0.02%     83.34MiB / 29.38GiB   0.28%     4.2MB / 10.1MB    88.2MB / 13MB    14
78d8a523c667   prometheus              0.00%     114.6MiB / 29.38GiB   0.38%     741MB / 50.7MB    190MB / 201MB    14
d602e2724c7a   cadvisor                1.48%     67.52MiB / 29.38GiB   0.22%     12.3MB / 691MB    166MB / 4.08MB   24
4718b3629c8e   docker_state_exporter   0.00%     11.54MiB / 29.38GiB   0.04%     2.85MB / 38.8MB   2.7MB / 90.1kB   14
c5a211185855   app                     337.52%   7.543GiB / 29.38GiB   25.67%    365MB / 883MB     360GB / 67.3GB   282
9b95fa3156bb   matomo_cron             0.00%     7.504MiB / 29.38GiB   0.02%     1.48kB / 0B       762MB / 0B       3
553a3e7389eb   matomo_web              0.11%     9.832MiB / 29.38GiB   0.03%     106MB / 203MB     8.89MB / 33MB    9
adf21bdea1e5   matomo_app              0.01%     113.3MiB / 29.38GiB   0.38%     166MB / 146MB     1.26GB / 153MB   4
96d873027990   matomo_db               0.03%     99.66MiB / 29.38GiB   0.33%     63.8MB / 126MB    118MB / 310MB    15
9d21fdde2ec9   node_exporter           0.00%     9.887MiB / 29.38GiB   0.03%     3MB / 48.9MB      10.5MB / 299kB   6

Reading around this forum I already tried reducing the numbers of unicorn processes from the auto-detection (8) to 4 but I don’t see any relevant change in terms of used CPU/Memory.

Last but not least, when we imported out db from vbulletin3 to discourse, the database itself was around 7GB. Checking today I can see that it grew tenfold.

du -sh /var/discourse/shared/standalone/* | sort -hr | head -n 10
70G     /var/discourse/shared/standalone/postgres_data
1.6G    /var/discourse/shared/standalone/uploads
807M    /var/discourse/shared/standalone/log
69M     /var/discourse/shared/standalone/redis_data
200K    /var/discourse/shared/standalone/postgres_run
28K     /var/discourse/shared/standalone/state
12K     /var/discourse/shared/standalone/tmp
12K     /var/discourse/shared/standalone/ssl
8.0K    /var/discourse/shared/standalone/backups
4.0K    /var/discourse/shared/standalone/postgres_backup

I suppose this is postgresql doing its thing in the background and creating tons extra data but is there anything that can be done to at least control it?

Output of the Discourse Doctor in case it could help:

DISCOURSE DOCTOR Mon May 15 09:44:17 AM CEST 2023
OS: Linux vmi1229594.OMITTED.net 5.15.0-67-generic #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux


Found containers/app.yml

==================== YML SETTINGS ====================
DISCOURSE_HOSTNAME=OMITTED
SMTP_ADDRESS=OMITTED
DEVELOPER_EMAILS=OMITTED
SMTP_PASSWORD=OMITTED
SMTP_PORT=OMITTED
SMTP_USER_NAME=OMITTED
LETSENCRYPT_ACCOUNT_EMAIL=OMITTED

==================== DOCKER INFO ====================
DOCKER VERSION: Docker version 23.0.1, build a5ee5b1

DOCKER PROCESSES (docker ps -a)

CONTAINER ID   IMAGE                                     COMMAND                  CREATED        STATUS                 PORTS                                      NAMES
254d80933447   grafana/grafana:latest                    "/run.sh"                8 hours ago    Up 7 hours             0.0.0.0:8443->3000/tcp                     grafana
78d8a523c667   prom/prometheus:latest                    "/bin/prometheus --c…"   8 hours ago    Up 8 hours             0.0.0.0:9090->9090/tcp                     prometheus
d602e2724c7a   gcr.io/cadvisor/cadvisor:latest           "/usr/bin/cadvisor -…"   8 hours ago    Up 8 hours (healthy)                                              cadvisor
4718b3629c8e   karugaru/docker_state_exporter            "/go/bin/docker_stat…"   8 hours ago    Up 8 hours                                                        docker_state_exporter
c5a211185855   local_discourse/app                       "/sbin/boot"             9 hours ago    Up 7 hours             0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   app
9b95fa3156bb   matomo:fpm                                "bash -c 'bash -s <<…"   20 hours ago   Up 20 hours            9000/tcp                                   matomo_cron
553a3e7389eb   nginx:alpine                              "/docker-entrypoint.…"   21 hours ago   Up 21 hours            80/tcp, 0.0.0.0:2053->443/tcp              matomo_web
adf21bdea1e5   matomo:fpm-alpine                         "/entrypoint.sh php-…"   21 hours ago   Up 21 hours            9000/tcp                                   matomo_app
96d873027990   mariadb                                   "docker-entrypoint.s…"   21 hours ago   Up 21 hours            3306/tcp                                   matomo_db
9d21fdde2ec9   quay.io/prometheus/node-exporter:latest   "/bin/node_exporter …"   36 hours ago   Up 36 hours                                                       node_exporter

c5a211185855   local_discourse/app                       "/sbin/boot"             9 hours ago    Up 7 hours             0.0.0.0:80->80/tcp, 0.0.0.0:443->443/tcp   app
adf21bdea1e5   matomo:fpm-alpine                         "/entrypoint.sh php-…"   21 hours ago   Up 21 hours            9000/tcp                                   matomo_app

Discourse container app is running


==================== PLUGINS ====================
          - git clone https://github.com/discourse/docker_manager.git
          - git clone https://github.com/discourse/discourse-spoiler-alert.git
          - git clone https://github.com/discourse/discourse-animated-avatars.git
          - git clone https://github.com/discourse/discourse-whos-online.git
          - git clone https://github.com/discourse/discourse-bbcode.git
          - git clone https://github.com/discourse/discourse-signatures.git
          - git clone https://github.com/discourse/discourse-reactions.git
          - git clone https://github.com/paviliondev/discourse-legal-tools.git
          - git clone https://github.com/discourse/discourse-patreon.git
          - git clone https://github.com/discourse/discourse-yearly-review.git
          - git clone https://github.com/discourse/discourse-user-notes.git
          - git clone https://github.com/merefield/discourse-user-network-vis.git
          - git clone https://github.com/discourse/discourse-calendar.git
          - git clone https://github.com/discourse/discourse-prometheus.git

WARNING:
You have what appear to be non-official plugins.
If you are having trouble, you should disable them and try rebuilding again.

See https://github.com/discourse/discourse/blob/main/lib/plugin/metadata.rb for the official list.

========================================
Discourse 3.1.0.beta4
Discourse version at OMITTED: Discourse 3.1.0.beta4
Discourse version at localhost: Discourse 3.1.0.beta4


==================== MEMORY INFORMATION ====================
RAM (MB): 31550

               total        used        free      shared  buff/cache   available
Mem:           30088        3958        1307        4269       24823       21475
Swap:           8191        1140        7051

==================== DISK SPACE CHECK ====================
---------- OS Disk Space ----------
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       194G   93G   91G  51% /

---------- Container Disk Space ----------
Filesystem      Size  Used Avail Use% Mounted on
overlay         194G   93G   91G  51% /
/dev/sda3       194G   93G   91G  51% /shared
/dev/sda3       194G   93G   91G  51% /var/log

==================== DISK INFORMATION ====================
Disk /dev/sda: 200 GiB, 214748364800 bytes, 419430400 sectors
Disk model: QEMU HARDDISK
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: OMITTED

Device       Start       End   Sectors  Size Type
/dev/sda1     2048      4095      2048    1M BIOS boot
/dev/sda2     4096   4194303   4190208    2G Linux filesystem
/dev/sda3  4194304 419428351 415234048  198G Linux filesystem

==================== END DISK INFORMATION ====================

==================== MAIL TEST ====================
For a robust test, get an address from http://www.mail-tester.com/
Or just send a test message to yourself.
Email address for mail test? ('n' to skip) [OMITTED]: n
Mail test skipped.
Replacing: SMTP_PASSWORD
Replacing: LETSENCRYPT_ACCOUNT_EMAIL
Replacing: DEVELOPER_EMAILS
Replacing: DISCOURSE_DB_PASSWORD
Replacing: Sending mail to

==================== DONE! ====================

Thanks @JammyDodger, didn’t notice I was in support :+1:

1 Like

Briefly, I would say

  • don’t worry about CPU usage unless it’s saturating all cores.
  • don’t worry about RAM usage, worry instead about swap activity.
  • probably start a new thread about DB size growing 10x

If your system offers swap activity metrics, use them. If it doesn’t, look for disk activity on the device holding the swap space.

Gotcha, with a lot of RAM I had read around not to worry too much about SWAP so I created 8GB just to not be totally out of a “safety net” if that makes sense.

Can you elaborate a bit further about what I should worry about in terms of swap activity?

Anything related to SWAP I could find in the dashboard is this:

And this is the full memory stack:

I’m used to work with containers/k8s for work so this details on a VM level eludes me in terms of what I’m supposed to keep an eye on.

If you have some entry level links and don’t want / don’t have time to write an essay here, it’s still appreciated :slight_smile:

Thanks for the images - I would mostly monitor the top one, the pages per second. If you see sustained activity then you need more memory. Short spikes like we see in your picture are fine. The max figure you have is under 1000, and presumably your system is healthy. So, watch for sustained activity exceeding say 500 pages per second.

Essentially, RAM is fast and swap is slow. The operating system will make as much use of the RAM as it can, and this is why “unused RAM” isn’t easy to measure or think about. In your case, the big red proportion is RAM used to cache disk contents, which helps application performance. If that green proportion grew to more than say three quarters, that would be worrying. Maybe for some workloads, more than a half would be worrying.

But what really hurts performance is swap activity, because swap is slow. Some static amount of swap usage is not important: that’s the top slice, the purple one. In your case, max swap usage is under 2G, compared to your 8G - you have lots of capacity. If max swap usage gets close to the amount of swap space you have, you might have a system crash imminent. Otherwise, not a concern.

So, watch for sustained swap activity, or for the red disk cache to be squeezed by the green application use.

2 Likes

Thank you so much for the detailed explanation Ed, much appreciated. I’ll play around with alerts later on so that was really helpful :slight_smile:

2 Likes