'768 worker_connections are not enough' Error

bartv · June 2, 2021, 9:13am

Hey!

Since rebuilding today we’re experiencing a high number of server errors. It seems to be an nginx connection issue; in nginx/error.log I sometimes see bursts of 768 worker_connections are not enough messages like this one:

2021/06/02 10:42:21 [alert] 1143#1143: *28468 1768 worker_connections are not enough while connecting to upstream, client: (IP removed), server: _, request: "POST /message-bus/8fc08436f86f47479cf0dad3deb5c4dc/poll?dlp=t HTTP/1.1", upstream: "http://127.0.0.1:3000/message-bus/8fc08436f86f47479cf0dad3deb5c4dc/poll?dlp=t", host: "blenderartists.org", referrer: "https://blenderartists.org/t/convert-multiple-objects-to-single-mesh-with-vertex-grouping/489173/2"

Any ideas how we can remedy this? We have plenty of CPU/memory available - could we increase the number of ‘worker connections’?

bartv · June 2, 2021, 10:43am

Update, I have increased my worker connections for the time being, but I still get these errors (less frequently & for the higher number of workers). I’m really curious if anything changed recently that might cause this, or how I could track this down better.

## Any custom commands to run after building
run:
  - exec: echo "Beginning of custom commands"

  - replace:
      filename: "/etc/nginx/letsencrypt.conf"
      from: "worker_connections 768" 
      to: "worker_connections 1768"

DrewH · June 2, 2021, 2:51pm

Interesting that this happened after a rebuild, have you recently performed any bulk actions? I’d check the Sidekiq logs and see if there are a large number of jobs there as well.

bartv · June 2, 2021, 3:25pm

I did have some bulk actions recently as we switched to the Thumbnail Preview TC, but there’s nothing in my sidekiq queue, I can definitely rule that out.

Falco · June 2, 2021, 3:28pm

We bumped the nginx version two days ago, so let’s keep an eye on it. Do you get over 500 concurrent visitors on your site?

Also your whole site is behind Cloudflare so stuff may be different because of it.

bartv · June 2, 2021, 3:31pm

I have no idea - we might? Any ideas how I can check that?

Correct. But I have disabled any acceleration and am basically only using it to cache images and avatars. It’s never been an issue until today…

Falco · June 2, 2021, 3:48pm

Haha, usually people use Google Analytics or something like that to know such info. Discourse dashboard has daily pageviews and user visits that can be used to approach that too.

Not true, your whole site is served via Cloudflare:

curl -I https://blenderartists.org/                                                                                                                                         
HTTP/2 200 
cf-cache-status: DYNAMIC
cf-request-id: 0a6ef945b3000002fe272b2000000001
server: cloudflare
cf-ray: 6591c4b5ec5902fe-MIA
alt-svc: h3-27=":443"; ma=86400, h3-28=":443"; ma=86400, h3-29=":443"; ma=86400, h3=":443"; ma=86400

But that may be completely unrelated as your nginx is complaining about upstream connections instead of downstream ones, which means it’s running out of connection between nginx ⟷ unicorn.

Since we keep an open connection for each visitor due to message_bus (live updates service), this is kinda expected if your site is somewhat popular.

Bumping the worker_processes and worker_connections is safe and sounds like it makes sense in your case. We default worker_processes to your number of CPU cores. How many CPU cores do you have?

bartv · June 2, 2021, 3:55pm

True We dropped that a long time ago… We have about 250k pageviews/day (including bots), so 500 doesn’t seem to unusual. The user visits only tracks logged in visits, right?

Right - we do have to pass our requests through CF but we don’t let them touch our javascript etc.

We have 12 cores, 64GB. Typical load is about 2, and we use 50% of our RAM.

Falco · June 2, 2021, 4:10pm

Damn that is so weird!

The formula for connections is worker_processes * worker_connections which should be 12 * 768, which would be (click clack) 9216. But your logs say 1768…

Try this on your app.yml:

## Any custom commands to run after building
run:
  - exec: echo "Beginning of custom commands"

  - replace:
      filename: "/etc/nginx/nginx.conf"
      from: "worker_connections 768" 
      to: "worker_connections 2000"
  - replace:
      filename: "/etc/nginx/nginx.conf"
      from: "worker_processes auto" 
      to: "worker_processes 10"

Be aware that your block on post 2 is acting on the wrong file!

bartv · June 2, 2021, 5:08pm

I pasted the wrong code - I tried the letsencrypt template first, but ended up changing the nginx.conf to 1768 worker connections.

I’ll give your values a try - I’ll be back to report how it goes.

bartv · June 2, 2021, 5:41pm

Still getting them, I’m afraid:

2021/06/02 17:40:03 [alert] 2102#2102: *262491 2000 worker_connections are not enough while connecting to upstream, client: <ip removed>, server: _, request: "POST /message-bus/0e453fae0c604c29a876e6ede05b7341/poll?dlp=t HTTP/1.1", upstream: "http://127.0.0.1:3000/message-bus/0e453fae0c604c29a876e6ede05b7341/poll?dlp=t", host: "blenderartists.org", referrer: "https://blenderartists.org/t/weight-paint-not-painting/551282"

bartv · June 3, 2021, 1:30pm

I have bumped worker_connections to 4000 and it’s looking good so far

Falco · February 18, 2022, 5:12pm

We made it easier to override now:

pfaffman · February 18, 2022, 7:48pm

Cool! So we’d do something like

params:
  nginx_worker_connections: 4000

In app.yml/web_only.yml?

Falco · February 18, 2022, 7:51pm

Exactly. We also bumped the default to 4k in the same patch, so admins may want to carefully evaluate if they still need to bump it.

pfaffman · February 18, 2022, 7:54pm

On one site I was also bumping worker processes to 2X CPUs. Should I remove that too?

system · March 20, 2022, 7:54pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Nginx worker_connections setting / concurrent users count Installation	2	361	April 25, 2023
How to avoid upstream timeouts? Support	26	9299	March 26, 2022
Too many connections to DB, how to optimize Support	20	3792	January 2, 2023
Browser upgrade fails when discourse is configured to use one unicorn worker Bug	5	394	October 18, 2023
Discourse Crash due to PSQL connection issue Installation	10	376	April 22, 2025

'768 worker_connections are not enough' Error

Related topics