High rebuild memory requirements: April 2025 edition

It looks like 2+2 may not be enough any more… I’m managing a fairly unassuming (no big/fancy plugins, etc) Discourse instance that, as of today, is failing to bootstrap because ember is chewing all the RAM it can find, and all the swap, and grinding the machine into unresponsiveness. Adding another 2GB of swap allowed the bootstrap to complete, with a peak swap usage of around 2.5GB.

9 Likes

Yikes, this is on @david’s list to investigate.

5 Likes

@david has been investigating, we do confirm that as it stands 2gb is enough for docker rebuilds, but not enough for the web upgrader to work.

One idea I have tossed around is just shutting down all ruby processes during the web upgrader to save an extra 300-500mb which would leave enough for asset precompilation.

A long term approach we are likely going to need to go down for self hosters is shipping bootstrapped containers which is a pandoras box cause how would a web upgrader be able to pull that one off. We don’t want to mount docker sockets …

it sure is a pickle.

2 Likes

Well, it wasn’t for me.

Is this compared between basic pure install against real world situations?

Indeed, it’s not perfectly consistent. Even with everything else shut down, it can still fail.

Unfortunately we’re fighting a losing battle against modern JS build tooling here. It’s all designed to run on 8GB+ of Ram on modern developer machines, not 1GB VPSs :cry:

We do have some solutions in mind. For example: providing pre-built assets which can be automatically downloaded. The big challenge we have is plugins, because they vary on everyone’s site, and right now they’re integrated tightly into the core build.

But for now, doing a full CLI rebuild should have a higher success rate than a web-ui update.

8 Likes

Like Jagster, 2gb RAM + 2gb swap is not, in fact, enough for my CLI-driven docker rebuild. Checking further, the only plugins on this install are docker_manager and discourse-prometheus – neither of which would, I expect, put unexpected load onto ember.

If the minimum specs have to change, that would suck, but it would be a lot better than the current situation, where machines unexpectedly grind to death on every upgrade.

7 Likes

If that’s the case, I think it would still be better to increase the recommended specs a bit. Personally, I don’t really mind adding 2 (or even 4) more GB of swap if it makes rebuilds more reliable - at least as long as daily operations are still fine with 2-4 GB of RAM (for small to medium-sized communities).

2 Likes

Indeed initial install failed in my recent install in 4c 4g instance. Ed suggested creating a swap file. Found the topic to create a swap and created a 4g swap. Now everything working as expected in web or cli update/upgrade

Imho we may just need to accept discourse requires more ram than it used to.

3 Likes

Wouldnt zram make sense?

We just landed this commit which should hopefully improve the situation. Please let us know how you get on! (it’ll hit tests-passed in the next ~30 mins)

When testing with a memory-constrained docker container locally, I can now get a successful build with -m 1600m. Before this change, the minimum I could achieve was -m 3000m.

11 Likes

I did a test rebuild (fresh install), which went through without issue. Now that machine has 4GiB of RAM (Hetzner CAX11) and a swap file of the same size, so it’s certainly less constraint than the 2+2 GB setup mentioned above. However, swap use was minimal during the whole build and the maximum RAM usage I saw was ~3.1GB. And most of the time, it stayed around ~2GB, so it doesn’t look too bad (build time was more or less unchanged, i.e. about 8 minutes).

4 Likes

I would quite like to do some controlled experiments, with clean installations and rebuilds, on a variety of setups, and in particular would like to see the difference (if any) of running with vm overcommit, but I’m afraid I’ve lacked the time.

(Without overcommit, a large process which forks will have an instantaneous increase in memory footprint which might be fatal, and it won’t show up on a polled monitor. Even with overcommit, memory increase could be rapid enough not to show up on a poll, whether htop or vmstat or something else.)

I don’t think I’ve ever seen anyone volunteer whether or not they are running with overcommit, although in my view it’s an important aspect of the host configuration.

1 Like

I bet most people don’t.

I set it automatically on my installs. I still get that warning about it, though.

Overcommit is irrelevant here, because the problem isn’t processes being OOM-killed prematurely, it’s just trying to stuff ten pounds of allocated memory into a five pound sack.

It’s not practically possible to run a Discourse rebuild with overcommit_memory=2, because ember (amongst other things, no doubt) pre-allocates masses of virtual memory (IIRC, 80GB or so is what I saw), so that’s always going to fall foul of any reasonable overcommit_ratio setting. Setting overcommit_memory=1 isn’t going to help, either, because, again, the problem isn’t an overzealous VMM killing processes, it’s hideously poor memory management from the ember compiler.

1 Like

I’m not sure I entirely agree with your analysis! As I understand it, overcommit allows processes to allocate memory which they don’t touch. It’s not just about OOM-killer behaviour. But as I say, I’d like to run some controlled experiments, that’s a better way to see what does and doesn’t make a difference.

I have 4GB of RAM and many plugins (no swap file of what I am aware of). How many plugins do you have and do you think plain 4GB of RAM is enough?

Partially correct, but even so, it’s irrelevant, because the problem being discussed in this topic is processes allocating memory which they do touch, and touching more of it than the system has available, which causes customer visible outages.

Can you confirm that post @david’s changes memory requirements are down? We should be in a reasonable state now.

The next big jump is going to be pre-compiling and distributed pre-compiled assets, it is a pretty big change to get there but it will erase a big bunch of work from the Internet once done.

2 Likes

With respect, I’m not sure about that. I’ve seen log files showing failures in forking. We’re saying in this thread that it’s “memory requirements” but that does - in my view - include the kernel’s tactics for virtual memory. Clearly an experiment or three will show whether I’m right or not about overcommit.

That was a fresh build without any plugins. I can try another one with a few plugins enabled and maybe temporarily disable swap to confirm that the build goes through (it’ll probably take a few days until I have time, though).

2 Likes