Should we increase the default swap space to 3GB or 4GB?

I’ve just done a bunch of upgrades and decided the easiest way to solve the memory errors was just to double the swap to 4GB. The down-side is that on a 1GB droplet there is only 25GB of SSD, so losing 4 of it to swap is a significant amount of space, and it’s already a bit tight with only 25 GB.

So, should we change discourse-setup to make the default be 3GB?

What do you think, @Falco ?

6 Likes

If that solves the problem I’m all up for it.

2 Likes

Can I strongly suggest that the installation script should also set the two kernel tunables which affect the memory handling? It would be good to know that everyone who apparently has a problem does at least have a starting point of a good kernel setup.

2 Likes

This seems like a sane idea to me. I can’t imagine where THP are possibly valuable on a dedicated discourse instance, and overcommit can help avoid OOM.

Might consider offering to do each of these separately, setting them as the default response, with the non-default option to opt out?

Also, the script can use sysctl to find out whether the settings need to be changed in the first place before making a change. If someone has already made these changes with different files, it would be potentially confusing to create duplicate overrides. I think that not all Linux distros ship with memory overcommit turned off in the first place.

if $(sysctl -n vm.overcommit_memory) != 1 ; then
    ....
fi
3 Likes

At the risk of diluting the important message about kernel tunables, there’s a second consideration: the present script only creates swap on a low-RAM machine. I think that’s a mistake, both because swap is still useful in maximising the usefullness of RAM, but more importantly because it will cause trouble if someone creates their Discourse on a large-RAM machine, for speed or convenience, and then downsizes it to small-RAM.

The setup should create swap in all cases (unless there’s already enough.) It’s valid and sometimes useful to have multiple swapfiles.

2 Likes

I’m not who decides, and I set these on machines I set up, but this is a shell script that gets run by (mostly) everyone who installs Discourse. It needs to be as simple as possible, and we’re sure those settings work on Rasperry Pi and Mac and whatever other nonsense people try to do? And whatever method you use to test if it’s set already works on all of those platforms? Seems hard.

I wrote discourse-setup and I find making changes to it a bit scary.

Always offering to create swap is not a bad idea. Maybe just always offer to set up 3 or 4 GB of swap? But Then how much? A rule of thumb I once knew was to have swap the same size as RAM. And right now, if you don’t create swap, your only option is to control-c out. So either we’re going to force people to create swap or add another Y/N question (that will cause me to modify my scripts that run discourse-setup :crying_cat_face:) Oh! Or we can have it controlled by a --skip-swap switch. That seems OK to me. If you’re smart enough to know about swap then you can find the switch to skip it; we can add a note about the switch in that WARNING message.

And maybe also add a note about the --skip-connection-test when that fails.
I think if they have swap set up already, it’s safe to assume that they know what they are doing.

1 Like

Thank you! And yes, fully understood, I’d feel the same way. It does need careful thought and testing, for sure. And that would be on at least a couple of hosting provider’s cheap machines, and on Pi too. Not sure about windows or mac - if they are expected to be supported, then I suppose so. I would expect them more likely to be used as dev machines, which is a different story.

Indeed. Whatever seems to be necessary at present, perhaps. It does seem to have taken a step up. But it grieves me a lot not to know whether these reports include or do not include the overcommit tweak. I’m pretty sure we know from previous discussions that overcommit makes a difference.

And we do know that on a 25G instance and even more so on a 20G instance that disk space is tight. I’m running on such machines: 25G disk and 1G RAM, which needs 2G swap already and probably more these days; and 20G disk with 2G RAM where I presently have 1G swap.

I wouldn’t recommend more Y/N questions. Command line options seems like a better route.

If we’re going to change this script at all I think I’d recommend several 1G swapfiles, because that maximises flexibility, it wastes nothing, and it’s the easiest time to make that decision.

I’m not so sure about this. If the smallest instance with naked Ubuntu or Debian happens already to have some swap - this would need to be checked - then we start to have problems if it’s not enough. Much better to measure RAM+swap using free, adjust as usual for the sub-1G configurations out there (AWS I think, maybe Oracle), and then add 1G swapfiles up to some agreed number, whatever that presently is. Hopefully a total of 4G is enough, with the kernel overcommit set appropriately.

I’m happy to help.

2 Likes

Hmm. Yeah. I wish I had thought to check on that on the ones I just tweaked.

Hmm. I’m of the mind that one is better, but multple does add flexibility and it could then be possible to get discourse-setup to add another swap file if more swap is needed, which means we could tell everyone to run discourse-setup to “fix” their swap problem. And maybe also the overcommit issue as well–maybe just explicitly try to do that only for linux, since that’s all we care about.

2 Likes

I disagree. Swap is not a universal good. It used to be important for making VM code more even when not swapping in certain circumstances, but VM algorithms have changed.

That was based on very old kernel code heuristics that no longer apply.

Also, when considering measuring: I don’t even know what you would want to do for measuring swap and memory when zswap is in use. This is probably a case of “first do no harm” though.

2 Likes

I’m pretty sure that the only disadvantage of “too much swap” is to use up more disk space than absolutely necessary. It’s one reason it would be preferable to have several modest-sized swapfiles - one can swapoff and remove them progressively, if there’s a need to recover the disk space. Also, I think linux does a reasonable job of using them progressively:

NAME                       TYPE  SIZE   USED PRIO
/var/local/swap/swapfile.1 file 1024M 863.6M   -2
/var/local/swap/swapfile.0 file 1024M   4.6M   -3

The situation we find ourselves in is that cheap instances are quite limited in both RAM and disk space, and Discourse uses ever more, as the very many packages within it evolve. But still, I think there are ways to navigate this wisely, to help those who are not in a position to just throw up their hands and double or quadruple their monthly bill.

1 Like

Swapping is slow enough that I would not take “barely out of room now” as a reason to add more than 1GB to the default suggestion at this point. Each 1GB is a lot of swap, as experienced on a dedicated Discourse instance.

Yes, by default Linux uses swap in priority order, and it’s possible to use the same priority across multiple devices to explicitly stripe swap. But adding loads of swap for small sites isn’t particularly valuable, I would suggest.

So if after roughly a decade people are only occasionally tripping over 2GB, I’d suggest moving to 3GB rather than 4GB as the default. And the necessary quantity of swap for a dedicated Discourse instance shouldn’t increase particularly with memory, because the content that would actually be swapped out doesn’t change particularly much.

The idea of growing swap with more memory is primarily from general purpose computing, based on a generalized assumption that the more RAM you need, the larger the demand is likely to be. But swap pressure on a specialized Discourse instance is not likely to follow that pattern, I think.

THP are specific to platforms supporting huge pages, which aren’t all platforms. The generic way to handle that is to see whether it exists. On one Raspberry Pi I have:

$ sysctl sys.kernel.mm.transparent_hugepage.enabled
sysctl: cannot stat /proc/sys/sys/kernel/mm/transparent_hugepage/enabled: No such file or directory
$ echo $?
255

By contrast, overcommit is a general VM parameter for Linux for the past few decades. On the same Raspberry Pi:

$ sysctl vm.overcommit_memory
vm.overcommit_memory = 0

Parsing output from free in shell is kind of a pain. Speaking as the original author of procps, for this I’d just look for SwapFree in /proc/meminfo. :smiley:

2 Likes

I agree that in our cost-constrained worlds scaling swap by RAM size is no longer a great plan. The next idea after that historically seems to have been that RAM is huge and you don’t need swap. After that comes the wisdom that some swap is useful because it allows better use of RAM. (In an unconstrained world we just have enormous amounts of RAM, but that’s a niche.)

What we are seeing in this past couple of months is more people having out of memory problems and failures to rebuild. More people finding web upgrades failing but command line working. From a simple support perspective, and from the perspective of the reputation of the product, I think we do need a change to the usual advice and the usual setup. I think 3G of swap is the simplest smallest change and we should do that if we don’t anything else.

But I still think that multiple smaller swap files is a wiser choice - and we have seen support threads on here which point to that. And I still think it would be best to try to size RAM+swap because that is the limiting factor, the thing which causes people to have trouble. It might be that there are different ways of doing that computation. The usual caveats apply about what tactics are maintainable, understandable, will have longevity.

As for transparent huge pages, my understanding is that it’s the “transparent” which causes trouble: the kernel can thrash about, merging and de-merging, for a performance hit and no great benefit. I’m pretty sure hugepages are ill-advised for smaller systems.

It’s more about the characteristics of the workload than the size of the system. On a 1GB RAM system with fairly stable processes with chunks of RAM, the default 2MB hugepages can reduce TLB thrashing and improve performance; the TLB doesn’t begin to cover the mappings for 1GB of RAM. It’s generally just a tradeoff between CPU spent looking for memory to coalesce and TLB cache misses, and there are plenty of workloads on 1GB machines that can benefit considerably from THP. (Many recommendations to disable it come from early in its implementation; it has been substantially improved since.) The recommendation to disable THP for Discourse it is not because of the 1GB RAM size, but is specific to using redis with on-disk persistence which is something Discourse uses:

Unfortunately when a Linux kernel has transparent huge pages enabled, Redis incurs to a big latency penalty after the fork call is used in order to persist on disk. Huge pages are the cause of the following issue:

2 Likes