429 too many connections issue with NGINX in front of NGINX

Can you provide some details about Your network?
are there any firewalls/Security solutions in front of docker that may be interfering?

250Mbps public bandwidth, standard firewall for ports etc. No limits, etc. I checked from server side, and everything looks ok, network load was really low. It was only <30 users logged in. It looks like internal application issue, errors was served by discourse engine.

Maybe you need to disable rate limiting.
take some cues from here if it helps?
https://meta.discourse.org/t/white-screens-under-higher-load-but-server-is-not-stressed/25855/

EDIT:
This particular post deals with increasing the limits.
https://meta.discourse.org/t/white-screens-under-higher-load-but-server-is-not-stressed/25855/17?u=itsbhanusharma

1 Like

I just received
image
When i was trying to answer the post…

Are you using a nginx/apache proxy in front of Discourse? If so, are you properly forwarding the client’s IP address to Discourse?

3 Likes

Let me try that.

Yes, only redirection is in app.yml

  - "8443:443" # https

If you are using a proxy and getting 429 errors after a small set of people join, you probably are not forwarding the client’s IP properly to Discourse and it is seeing everyone as the same server IP, hence why you are hitting the rate limits.

Have you read

5 Likes

I also got 429 today with v2.0.0.beta1 +9. Using, with an NginX configured in front, no change in configuration:

templates:

  • “templates/web.template.yml”
  • “templates/web.ratelimited.template.yml”
  • “templates/web.socketed.template.yml”

I never had this before, and the current use of the instance is not so high. I smell a bug in rate limiting rather.

I smell a bug in your forwarding nginx config.

6 Likes

Ok, so this is my sitename config:

upstream motomirko-prod {
  server 127.0.0.1:8443;
}

server {
  server_name motomirko.pl;
  listen 443 ssl http2;

  include conf.d/ssl;
  ssl_certificate           /var/discourse/shared/standalone/ssl/motomirko.pl.cer;
  ssl_certificate_key       /var/discourse/shared/standalone/ssl/motomirko.pl.key;

  proxy_set_header Host $host;
  proxy_set_header X-Real-IP $remote_addr;
  proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

  # add HSTS for security reasons
  add_header Strict-Transport-Security "max-age=31536000" always;

  location ~ ^/chat/topic/[0-9]+/[0-9]+ {
    #error_log /var/log/nginx/rewrite.log notice;
    #rewrite_log on;
    rewrite "^/chat/topic/([0-9]+)/([0-9]+)$" "/chat/offtop/$1/$2" redirect;
  }


  location / {
    proxy_pass https://motomirko-prod;
  }

  error_page 500 502 503 504 /error.html;
  location = /error.html {
    root /var/www/error/;
    internal;
  }
}

Can you please look at it and tell what I can fix?

Well the configuration didn’t change in a while, except for the http2 timeout option I added after hitting the 429… Here you go… It uses 5 different files:

  1. /etc/nginx/conf.d/discourse.conf:
#
## discourse upstream
#

upstream discourse {
        server unix:/var/discourse/shared/web/nginx.http.sock;
}
  1. /etc/nginx/le.conf:

# LE configuration for 80 and 443

location /.well-known/acme-challenge {
        alias /srv/www/.well-known/acme-challenge;
}

# Add some more security headers

add_header X-Content-Type-Options nosniff;
#add_header X-Frame-Options SAMEORIGIN;
#add_header X-XSS-Protection "1; mode=block";
  1. /etc/nginx/le-ssl.conf:
# SSL Configuration
#
# In /etc/nginx/sites-available/ssl.example.org:
#
# Replace 'ssl.example.org' with your secure domain
# Add the resulting lines to your server configuration:
#
# include              le-ssl.conf
# ssl_certificate      /etc/letsencrypt/live/ssl.example.org/fullchain.pem;
# ssl_certificate_key  /etc/letsencrypt/live/ssl.example.org/privkey.pem;

ssl on;

ssl_ciphers 'ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-DSS-AES128-GCM-SHA256:kEDH+AESGCM:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA:ECDHE-ECDSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA256:DHE-RSA-AES256-SHA256:DHE-DSS-AES256-SHA:DHE-RSA-AES256-SHA:AES128-GCM-SHA256:AES256-GCM-SHA384:AES128-SHA256:AES256-SHA256:AES128-SHA:AES256-SHA:AES:CAMELLIA:!DES-CBC3-SHA:!aNULL:!eNULL:!EXPORT:!DES:!RC4:!MD5:!PSK:!aECDH:!EDH-DSS-DES-CBC3-SHA:!EDH-RSA-DES-CBC3-SHA:!KRB5-DES-CBC3-SHA';

ssl_prefer_server_ciphers on;

ssl_dhparam /etc/ssl/dhparams.pem;

ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

ssl_session_cache shared:SSL:10m;

ssl_stapling on;
ssl_stapling_verify on;

add_header Strict-Transport-Security 'max-age=63072000';
  1. /etc/nginx/proxy_params:
proxy_set_header Host $http_host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
  1. /etc/nginx/sites-enabled/ps.zoethical.com:
## ps.zoethical.com
#

server {
        listen       80;
        listen       [::]:80;
        server_name  ps.zoethical.com;

        include le.conf;

        return 301 https://$server_name$request_uri;
}

server {
        listen       443 ssl http2;
        listen       [::]:443 ssl http2;
        server_name  ps.zoethical.com;

        include      le.conf;
        include      le-ssl.conf;

        ssl_certificate      /etc/letsencrypt/live/zoethical.com/fullchain.pem;
        ssl_certificate_key  /etc/letsencrypt/live/zoethical.com/privkey.pem;

        root         /srv/www/zoethical.com/ps;
        index        index.html;

        client_max_body_size 0;
        http2_idle_timeout   5m;

        location /errorpages/ {
                alias /srv/www/zoethical.com/errorpages/;
        }

        location / {
                proxy_pass         http://discourse;
                proxy_http_version 1.1;
                proxy_redirect     off;
                proxy_set_header   Upgrade $http_upgrade;
                proxy_set_header   Connection "upgrade";
                include            proxy_params;

                error_page 502 =502 /errorpages/discourse_offline.html;
        }
}

… what?

Not that I think that’s causing this, but isn’t that asking to treat everything like a websocket connection?

4 Likes

Hmm, there’s no reason to do that at all indeed since Discourse does not use websockets. I can’t remember why I put this here. Might be from Nginx + discourse, and then it’s indeed unnecessary. It’s always good to have other people looking at your configuration files!

But then, nothing like a configuration error leading to 429?

Not in the bits of the config you’ve shared so far, no. But where’s the rest?

You need to use the set_real_ip directive to be consumed from the header that is forwarding the IP to the internal NGINX.

To be honest I would just recommend dropping the rate limiting header and doing the rate limiting in the app using:

When you put NGINX in front of NGINX you are opting into pain.

Fix your configuration, or stop doing double-NGINX to reduce your configuration’s complexity.

The biggest problem is with discourse update. Without second Nginx with splash screen “Maintenance time, we’ll be back soon” there’s just ugly 404.

I’m not sure what ‘rest’ you’re referring to, Matt. I removed the offending lines and upgraded the security headers a bit. I can show the updated configuration if you like.

Logs in the Web container clearly show that the remote IP is taken into account. I reviewed my logs more thoroughly and realized that since last November I have (very few – 40) instances of rate limiting issues mostly coming from /mini-profiler-resources/results (38).

@sam I wouldn’t recommend dropping the rate limit entirely, unless you’re already behind a proxy doing it for you.

Hmmm so disable mini profiler ?

Not advocating removing all rate limits, just handling application rate limiting in the app if your config is too complex

1 Like

Guys, I’m getting all sorts of instability that has appeared only recently.

Intermittent:

  • 429 Errors on posting
  • Oops! That page doesn’t exist or is private. when attempting to access Admin
  • Error while trying to load …

As far as I’m aware both have vanilla Docker setups with no additional proxies.

Any ideas?

EDIT: A Discourse upgrade to v2.0.0.beta2 +166 seems to have fixed it … (wasn’t offered an upgrade, but went to /admin/upgrade manually) …

So definitely suspect something odd with early V2.0.0 beta?

@Pawel_Kosiorek is this resolved for you with an upgrade?

1 Like