Tuning high-volume site 429 Errors on message bus--should I worry?

I’m working on a fairly high-volume site (>150K pageviews/day). I’m getting some 429 errors, at least mostly on the message bus. I earlier had some issues due to a misconfiguration of set_real_ip_from, but that’s resolved. I also removed (perhaps temporarily?) the rate limiting template.

I’m still seeing about .5 429 errors per second.

I’ve got 5 unicorn workers with a 2 core/4 thread CPU. 16GB ram. Postgres is on a separate host. CPUs remain >50% idle

I removed the rate limiting template and raised unicorn workers to 5 at about 8:20.

That is completely normal, message-bus will backoff with 429 as your unicorns are under heavy load and queuing a little bit.

4 cores with 16gb ram is a really weird ratio if the node is not running the database. 8/8 would be better, for example.

4 Likes

Great! Thanks. CPU is still getting slammed with image processing, which should be done in a day or two, I’d hope.

True enough. But the bare metal has 2 cores/4 threads. It’s easy to add RAM, not cores (I’ve got another one at home with 32GB!). I split database and web to two machines to get more CPU. I have half a dozen other databases from low-traffic sites on the same database server (web on different host). Would you think it better to just run the DB and and web on the same machine? I’d lose some CPU but improve latency, I guess.

If you have loadbalancer capability here, then you might try putting web workers on both machines for your high-load site, with less on the one with the database, maybe 5+2?

If solving the problem with money is an option, just get another host with a better CPU:RAM ratio.

Well, these machines, that I got for free, are getting a little long in the tooth, so I’m starting to come to terms with that–the single CPU performance still seems better than a DO droplet, though. If solving the problem with money in the short term were an option, then I’d likely not have this particular client who wants enterprise performance at business prices. :wink:

But I also see that I had the number of unicorns hard-coded somewhere else in the chain, so I’m still running only 3.

Sadly, my current configuration, only talks to docker on the one host. I should spend a bit more time and see if I can stick a couple unicorns on the other machine too, though. Probably about time to look at HAproxy again, but I’ve got another project that I really want to get launched first.

Thanks very much for your insight.

EDIT: And when I finally moved to 5 unicorns rather than 3, the performance graphs look about the same (but maybe a tiny bit slower?), but the 429 errors dropped significantly. It looks like once the image processing is done, this is going to work just fine. :relieved:

2 Likes

And, just a day later, the 429 errors are down to almost zero, so Rafael’s “just don’t worry” was brilliant advice! Thanks again, Kane and Rafael. I can’t overstate how much I appreciate your help

4 Likes
6 Likes