Discourse not using much RAM

Hi!
I have discourse on docker, and it never seems to use any swap when RAM is being used too much, making it so it crashes or say fatal error: out of memory allocating heap arena map if i try to rebuild, unless i reboot every few hours it doesnt seem to work.

Does anyone know how to fix this?

Thanks,
Kian

Assuming you’re on linux, what does free -h show? (Preferably, immediately after restart, and also shortly before the error.)

2 Likes

maybe swappiness is set to 0 ?
cat /proc/sys/vm/swappiness

2 Likes

Hi!
it was at 40 ive now set it to 60.

Screenshot_440

The system always caches large chunks of RAM, and swap is never used even when the RAM has high usage.

I think the two parts of the output to concentrate on are the 5.5G of available memory and the 0B of used swap. Or maybe the 9G of free swap. The 5.5G + 9G tells you how much headroom you have. The amount of memory used for buffers and cache is dynamic and should never cause a shortage.

I think with a time to failure of only a few hours, I’d run a vmstat 5 and capture the output, in such a way that you can see the final moments. I used to use a cron job to run vmstat 5 5 into a log file every 10 minutes.

It’s possible with misbehaving software that it will very quickly use all available resources. In which case a log of ps uax every few minutes - in order to get a few snapshots at the crucial moments - could be very useful.

It’s also possible that some other limits are in play. Presumably this is a vanilla installation on a vanilla OS, with nothing else running and no special configuration?

2 Likes

Hi!
How would I go about writing vmstat 5 to a log every 10mins? And how would I get ps uax to write to a log every few mins?

Yes its a vanilla 18.04 Ubuntu Server install. Only got stuff like apache, docker, etc.

Ive just remembered also I have Varnish Cache installed which explains the RAM which is cached. But i dont see why Discourse wont use swap also. I set its allocation a few days ago with a docker command to set its swap limit but it did nothing

Here’s a one-liner which is cheap and cheerful (cron being a better way)

sh -c 'rm -f /tmp/stop; while [ ! -e /tmp/stop ]; do (date; uptime; free; ps faux; vmstat 5 5) >> /var/log/monitor.log; sleep 600; done' &

It’ll run forever: to stop it, touch /tmp/stop
The log will appear in /var/log/monitor.log - use tail -99 to see the final throes, or less to page through it. Somehow you need to find the sections in the log which show the trouble developing.

It feels like you’re asking yourself the wrong question here. It’s the linux kernel which looks after virtual memory, including the use of buffers and the use of swap. If free reports that swap is configured, that’s as it should be and you have nothing to do.

Your real question is, why is my discourse not running well, why does it need to be restarted, and why am I seeing the fatal error.

I would very much recommend you retitle this topic as
Why “fatal error: out of memory allocating heap arena map”?

But also, I worry that you seem to have several distinct observations:

  • sometimes discourse crashes
  • sometimes I see “fatal error:… heap arena map” when I rebuild
  • sometimes I need to reboot every few hours

and it’s not clear to me exactly how those observations interact.

  • what makes you believe discourse crashed: what is the observation?
  • do you always see “fatal error:” on a rebuild?
  • why are you rebuilding?
  • what prompts you to reboot, and do you mean reboot the server?

It would good to hear the answers!

1 Like

Hmm, what did you do? What does

docker stats --no-stream --no-trunc

report for MEM USAGE / LIMIT and MEM %?

(In my case, the LIMIT is something a little under the machine’s physical memory. Perhaps this means that nothing running within the container is likely to cause swapping, and you could see a process fail to allocate memory even when there’s little or no swap in use.)

Hi!
I believe discourse has crashed cause when I go to the domain I get Nginx 502 Bad Gateway. The docker container is still up though.

Yes, apart from the odd occasion.

I rebuilt it as that often fixed the 502 bad gateway for a while.

And I reboot the server, so see if that fixes the error and it can often work but more than likely it wont and a rebuild fixes it for a while.

Ill also get that error log soon

When I run /var/log/monitor.log - use tail -99 I get -bash: /var/log/monitor.log: Permission denied

(From your screenshot, I see the container called “app” has a memory limit of 7.8G and is using just 3% so that’s fine. Edit: but it might be suspicious that it’s using 100% of CPU.)

We just need to look at the end of that log file, so
tail -199 /var/log/monitor.log
might give us what we need. But we might need to look at more of it: perhaps you can zip up the log file and attach it, or share it some other way. How big is the log file?
ls -l /var/log/monitor.log

I think 100% is 1 core. Cause the system runs fine

log.txt (25.0 KB)
199 lines of the log.

monitor.txt (39.3 KB)
heres the full log :slight_smile:

Thanks. That all looks healthy, but it’s only a single snapshot. What should happen is that every 10 mins another section gets appended to the log. Wait until you see the problem with your discourse, then share the last few sections.

I have to say, I don’t know what’s going on.

I notice your 3 unicorns are using lots of CPU, and I don’t know why they should be.

USER       PID %CPU %MEM     VSZ    RSS TTY  STAT START  TIME 
 COMMAND
x          434 51.9  2.8  443732 234144 ?    Sl   18:26  0:11  \_ unicorn master -E production -c config/unicorn.conf.rb
x          662  103  3.6 8877408 301148 ?    Rl   18:26  0:08  |   \_ discourse sidekiq
x          686 99.7  3.6 8873312 301916 ?    Rl   18:26  0:06  |   \_ unicorn worker[0] -E production -c config/unicorn.conf.rb
x          731 94.3  3.6 8873312 294368 ?    Rl   18:26  0:05  |   \_ unicorn worker[1] -E production -c config/unicorn.conf.rb
x          744 77.2  3.3 8873312 276788 ?    Rl   18:26  0:03  |   \_ unicorn worker[2] -E production -c config/unicorn.conf.rb

You can run top and press shift+m to sort by RAM usage and see what is using the most RAM. Can you post the result here?