I know that Discourse setup provides sane defaults for most occasions nowadays, but I still want to ask some guidance on optimal settings. It seems that we will get a hosting partner that provides us with the server and what is on the table is this VPS:
6x vCore CPUs (Xeons @ 2.40GHz, I beleive)
8GB of RAM
My primary goal is to tweak the settings for maximum capability of handling traffic spikes, ie. concurrent users. How would you experts set the number of unicorns and the size of the database buffer?
Found the limits of our setup yesterday, with about 400+ concurrent sessions and many of them being logged in, actively polling and chatting. The day before we delivered 138k page views without a hitch.
The bottleneck was the number of unicorns (8). We only reached about 35% CPU loads and there were 2GBs of free RAM with these settings. I changed to 10 unicorns, after which we had no more hiccups and served a peak of 440 sessions, and got CPU loads closer to 50%.
But then it was getting late and things cooled down and I wasn’t able to do any more live performance testing with max load.
Anyway, with UpClouds 6-core / 8GB RAM plan (or similar) I would say that 10 unicorns is better than 8. I’ll be looking at the amount of free RAM and and lets see if even 11-12 would be feasible. Ping @mpalmer?
With these changes the amount of free memory has reduced from 2GB to 1GB, so roughly -0.5 gigs per unicorn added.
Is there a recommendation for the amount of RAM to keep free, thus available for Linux to use as disk cache? I have 1/4 of the RAM (=2GB) allocated to the database buffer.
Increasing the number of unicorn workers to suit your CPU and RAM capacity is perfectly reasonable. The “two unicorns per core” guideline is a starting figure. CPUs differ (wildly) in their performance, and VPSes make that even more complicated (because you can never tell who else is on the box and what they’re doing with the CPU), so you start conservative, and if you find that you’re running out of unicorns before you’re running out of CPU and RAM, then you just keep increasing the unicorns.
As far as disk cache goes, you need as much as you need. As you increase RAM consumption, keep an eye on your IOPS graphs, and particularly the percentage utilisation of the disks (what sysstat refers to as %util). There’s two points you want to be aware of: when your read IOPS start to stay persistently higher (that means that the working set of disk pages no longer fits in memory – you may or may not have hit that already), and when peaks in disk utilisation start to get close to 100%. The former tells you when your RAM consumption is starting to impact performance, and the latter tells you when you’re starting to saturate the disks. You want to consume RAM up to the first point, and you can drive it as far as you’re comfortable towards the second point, depending on your tolerance for performance degradation (keep an eye on your service times!).
Another RAM-related metric to keep an eye on is swap rate. That’s not how much swap space is being used (as long as your swap partition isn’t full, it doesn’t matter), but the number of pages per second being written to and read from swap. Swap writes are fine, but if the system is constantly swapping pages in and out, even only a few per second, you probably want to back off on your RAM usage a bit.
Just to keep you on your toes, swap activity counts towards disk IOPS, too, so your disk utilisation will likely go loco bananas when you start to run out of RAM, due to both extra disk reads (because the cache isn’t big enough) and increased swap activity, because the working set of memory pages doesn’t fit any more. That’s a recipe for performance disaster right there.
Thanks Matt. Crystal clear and informative, as usual.
Here is a some more nerd porn. Our API requests for this year. This is what it looks like when there is a hockey league transfer deadline on the 15th of Feb at 23:59 local time.
I am currently using 10 unicorns (IIRC) on my 6-core instance at UpCloud. This gets me above 400+ concurrents. We’ll see in February what it takes to choke this.
We had several reports of people using cloudflare proxy feature and having problems with our live updates. Some people claim disabling brotli support helps, but we recommend using Cloudflare in DNS only mode (gray cloud).