Tuning high-volume site 429 Errors on message bus--should I worry?

pfaffman · November 19, 2020, 6:13pm

I’m working on a fairly high-volume site (>150K pageviews/day). I’m getting some 429 errors, at least mostly on the message bus. I earlier had some issues due to a misconfiguration of set_real_ip_from, but that’s resolved. I also removed (perhaps temporarily?) the rate limiting template.

I’m still seeing about .5 429 errors per second.

I’ve got 5 unicorn workers with a 2 core/4 thread CPU. 16GB ram. Postgres is on a separate host. CPUs remain >50% idle

I removed the rate limiting template and raised unicorn workers to 5 at about 8:20.

Falco · November 19, 2020, 7:24pm

That is completely normal, message-bus will backoff with 429 as your unicorns are under heavy load and queuing a little bit.

4 cores with 16gb ram is a really weird ratio if the node is not running the database. 8/8 would be better, for example.

pfaffman · November 19, 2020, 8:41pm

Great! Thanks. CPU is still getting slammed with image processing, which should be done in a day or two, I’d hope.

True enough. But the bare metal has 2 cores/4 threads. It’s easy to add RAM, not cores (I’ve got another one at home with 32GB!). I split database and web to two machines to get more CPU. I have half a dozen other databases from low-traffic sites on the same database server (web on different host). Would you think it better to just run the DB and and web on the same machine? I’d lose some CPU but improve latency, I guess.

riking · November 19, 2020, 8:51pm

If you have loadbalancer capability here, then you might try putting web workers on both machines for your high-load site, with less on the one with the database, maybe 5+2?

If solving the problem with money is an option, just get another host with a better CPU:RAM ratio.

pfaffman · November 19, 2020, 9:27pm

Well, these machines, that I got for free, are getting a little long in the tooth, so I’m starting to come to terms with that–the single CPU performance still seems better than a DO droplet, though. If solving the problem with money in the short term were an option, then I’d likely not have this particular client who wants enterprise performance at business prices.

But I also see that I had the number of unicorns hard-coded somewhere else in the chain, so I’m still running only 3.

Sadly, my current configuration, only talks to docker on the one host. I should spend a bit more time and see if I can stick a couple unicorns on the other machine too, though. Probably about time to look at HAproxy again, but I’ve got another project that I really want to get launched first.

Thanks very much for your insight.

EDIT: And when I finally moved to 5 unicorns rather than 3, the performance graphs look about the same (but maybe a tiny bit slower?), but the 429 errors dropped significantly. It looks like once the image processing is done, this is going to work just fine.

pfaffman · November 20, 2020, 4:50pm

And, just a day later, the 429 errors are down to almost zero, so Rafael’s “just don’t worry” was brilliant advice! Thanks again, Kane and Rafael. I can’t overstate how much I appreciate your help

Falco · November 20, 2020, 5:18pm

system · December 20, 2020, 5:32pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to avoid upstream timeouts? Support	26	9299	March 26, 2022
Recommendation for number of workers: cores × 2? Installation	16	190	April 29, 2025
High CPU usage (Ruby) Support server-resources	21	334	March 5, 2025
Optimizing the number of Unicorns and buffer size Installation server-resources	24	6742	December 13, 2020
Due to extreme load, this is temporarily being shown to everyone... when it's not really the case Installation server-resources	19	1601	July 21, 2023

Tuning high-volume site 429 Errors on message bus--should I worry?

Related topics