High load due to peak anonymous sessions, increase unicorn workers?

Terrapop · February 9, 2021, 6:19pm

We just had peak traffic of approx 1,500 concurrent (mostly anonymous) users visiting a single page.

And the forum went into the mode, where the warning is displayed to all members concerning high load.

CPU-Optimized Digital Ocean Droplet

Dedicated CPU: 4 vCPUs
RAM: 8 GB

Unicorn workers: 10

Given only approx. 50% of RAM and CPU are utilized would it help to increase unicorn workers for such peak traffic cases from anonymous visitors or not?

Falco · February 9, 2021, 6:27pm

Yes, increasing unicorns is the first step here.

Terrapop · February 18, 2021, 7:28pm

I have increased the workers to 24. No difference (goes still into “Due to extreme load, this is temporarily being shown to everyone as a logged-out user would see it.”), with a similar concurrent visitor peak (99% anonymous) just now:

codinghorror · February 18, 2021, 7:46pm

I know @sam has spent a lot of time on this recently and might have commentary?

Terrapop · February 20, 2021, 7:04pm

@sam Any ideas on how to further optimize for peak traffic from anonymous traffic (e.g. if a singe topic goes viral on social media). In both cases outlined above, memory and CPU still have plenty of room (according to Digital Ocean), and we have not even hit a load of 4, still, the forum goes into “extreme load” mode, despite tripled the number of workers.

Terrapop · February 20, 2021, 7:52pm

Just went into “extreme load mode” again, with only 600 concurrent visitors total (99% anonymous) and a load of not even 1.

Falco · February 21, 2021, 1:23am

You need to collect some data so we know what is the bottleneck.

Prometheus exporter plugin for Discourse

Alec · February 21, 2021, 4:36am

I believe the DO data monitor is not sensitive enough and somewhat misleading. I experimented with extreme load with Hetzner and Digital ocean. With Hetzner when the extreme load message came up there was a short sharp peak where it would go to 120%.

It lasted maybe a second, before dropping down to 40-50% mark.

Recreated the same thing with Digital ocean , and from memory it appeared CPU usage never got above 50%. (but you could not change the x axis to the seconds level)

My guess is DO CPU level is maybe the average of 5 or 15 seconds. So you dont see the short sharp peaks

sam · February 22, 2021, 4:53am

We are going to need prometheus exporter reports to look any deeper.

If you have the ram and the cpu … you can always add more unicorn workers, that will scale up for these peaks. You just don’t want to swap memory, cause performance will go way down.

TallTrees · February 22, 2021, 6:49am

Seems like in such a case that single topic page should be able to be cached and served statically for a short period without having to hit the back end at all. I’ve no idea if Discourse can do that (i.e. set cache control headers when under load and serving content to anon users) and if the DO setup has a capable caching proxy in the chain, but it’s a development idea that might be worth a thought if I’m not totally wrong and it isn’t already done.

Maybe @sam already thought or did this, or knows why it is a bad idea!

codinghorror · February 22, 2021, 7:05am

That already happens dynamically under measured load on a per-topic basis, that’s exactly what

… is referring to. It’s READ ONLY though, so people can’t actually have conversations in that mode.

TallTrees · February 22, 2021, 7:10am

Yep, but my suggestion is to boot just the anon users to a cached page with a short time out (60s?) to take their load off in the hope that the rest of the site can keep going in read-write mode.

Terrapop · February 22, 2021, 10:40am

That would be great. Currently, if we feature a topic on our 200,000+ Telegram channel, it puts the entire Discourse site into the “read-only” mode for almost one hour. Although logged-in users are just around 50 (99% is anonymous traffic).

sam · February 23, 2021, 5:31am

This already happens, we have pretty aggressive caching direct in Redis for anon users on topic list pages and topic pages. 60s timeout.

Terrapop · February 23, 2021, 5:50pm

I will try to get Prometheus running in order to find out the bottleneck, but probably it’s DOs monitoring that is lagging as mentioned by @Alec. If this is the case, a larger machine is the way forward I assume?

Topic		Replies	Views
High CPU usage (Ruby) Support server-resources	21	367	March 5, 2025
Best configurations for speeding up standalone discourse Installation server-resources	26	4023	February 14, 2024
Optimizing the number of Unicorns and buffer size Installation server-resources	24	6761	December 13, 2020
Getting extreme load warning while server resources are not being used Installation	5	906	September 2, 2021
Recommendation for number of workers: cores × 2? Installation	16	227	April 29, 2025

High load due to peak anonymous sessions, increase unicorn workers?

Related topics