Nginx, nginx, and docker


(Lee_Ars) #1

Have been banging on converting my install to Docker all day, and I’m nearly done. However, I’m at kind of a configuration quandary with how to best arrange the layer cake of web servers.

Givens for this install:

  1. The (real) server runs varnish + nginx with multiple production web sites. Varnish is on port 80 reverse-proxying to nginx, and nginx itself listens on 443 for HTTPS traffic (since varnish doesn’t do that—well, that’s not quite true, but that’s a whole other book).

  2. The server has a single public static IP, and nginx decides which web site to serve based wholly on on http request hostname (in other words, each nginx virtual host definition has its own unique server_name parameter).

  3. Varnish must remain listening on port 80, and nginx must remain listening on port 443. Yes, I know Discourse’s preferred option is to use a CDN instead of Varnish, but varnish is required for other stuff and even bypassing it entirely with pipe in its vcl file still has it playing a reverse-proxy role. So, it’s in the mix even if it’s “off.”

With those known, how best to pass traffic to Docker?

There appear to be two options:

A) Leave nginx active inside of docker, with a multi-step reverse proxy. The flow would be requester -> varnish -> nginx (real) -> nginx (docker) -> unicorn for HTTP and requester -> nginx (real) -> nginx (docker) -> unicorn for HTTPS.

B) Don’t load nginx inside of docker—only run unicorn and open up port 3000 on docker. The flow then would be requester -> varnish -> nginx (real) -> unicorn for HTTP and requester -> nginx (real) -> unicorn for HTTPS.

Pros of A are that it’s simpler to install and configure and maintain (and, as @sam has noted, it lets the Discourse devs fold in updated nginx configurations as part of the normal upgrade cycle).

Cons of A are the multiple reverse proxies. I’m not terribly concerned about performance, but I am a little concerned about having that many layers in the stack and having all the client request metadata survive the trip through. I don’t know if it’s bad or not, and that’s a little scary.

Pros of B are that it makes client-side administration easier (for me, at least), and more importantly hacks a layer out of the cake.

Cons of B are that it gets…complicated. All of the locations referenced point at paths inside the docker container, which…er, actually, come to think of it, this isn’t even going to work, is it? Because public and everything in it is inaccessible inside the docker container, right?

Could use some advice here. How are you guys typically rolling out docker in production? Do you just do the double-reverse-proxy thing? I mean, I can actually use varnish to push HTTP traffic bound for the right hostname past the production nginx and directly to docker nginx, but it won’t help for HTTPS.


Externally connect to container's database
Troubleshooting email on a new Discourse install
(Lee_Ars) #2

I went ahead and pulled the trigger, tentatively, and went live with method A. Everyone looks to Discourse like they’re all posting from 127.0.0.1 the web server’s real LAN IP address, so I need to look up some config examples of double-reverse-proxying with nginx and see how to properly preserve request IP addresses all the way down the chain.

On the plus side, HTTPS works fine—nginx is a great SSL terminator, and the production instance of nginx is happily doing just that.

I left @sam’s internal nginx config alone, and just slapped this together for the public-facing nginx:

#fingers crossed...
server {
	server_name discourse.bigdinosaur.org;
	listen 8881;

	location / {
		access_log off;
		proxy_pass http://discourse.bigdinosaur.org:8089;
		proxy_set_header X-Real-IP $remote_addr;
		proxy_set_header Host $host;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
	}
}

#HTTPS server
server {
	server_name discourse.bigdinosaur.org;
	listen 443 ssl spdy;

	ssl on;
	ssl_certificate /path/to/my/cert;
	ssl_certificate_key /path/to/my/key;
	ssl_protocols TLSv1.1 TLSv1.2;
	ssl_ciphers ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AES:RSA+3DES:!ADH:!AECDH:!MD5:!DSS:!RC4;
	ssl_prefer_server_ciphers on;
	ssl_ecdh_curve secp521r1;
	ssl_dhparam /etc/ssl/private/dhparam.pem;

	location / {
		access_log off;
		proxy_pass http://discourse.bigdinosaur.org:8089;
		proxy_set_header X-Real-IP $remote_addr;
		proxy_set_header Host $host;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
	}
}

(HTTP nginx is listening on 8881 because, as mentioned in the OP, varnish is on port 80; docker nginx is listening on 8089.)

edited to add - times in the miniprofiler are considerably faster now, too. Not sure if that’s because the docker package is that much better, or if my original install was just getting crufty, or both. A shift-refresh on a 150-ish post thread yields ~200ms, and a shift-refresh on Latest gets ~120ms.


(Ben T) #3

Try placing the proxy headers above the proxy pass line. It may just be passing off and ending the evaluation there.


(Lee_Ars) #4

I didn’t think nginx evaluated location stanza items in any particular order, but you might be right. Will give that a shot.

edit - No joy. I’ll poke some more tomorrow—bed time now :slight_smile:


(Ben T) #5

This is the simple passing configuration I’m using. It does not feature any ssl tricks, but I’m seeing IP addresses passed correctly. I only see two differences: that I’m passing to an upstream and:

proxy_set_header Host $host;
vs
proxy_set_header  Host $http_host;
upstream discourse {
  server localhost:8020;
}

server {
  listen 80;
  gzip on;
  gzip_min_length 1000;
  gzip_types application/json text/css application/x-javascript;

  server_name forums.cityfellas.com;

  sendfile on;

  keepalive_timeout 65;
  client_max_body_size 2m;

  location / {
    #add_header "Access-Control-Allow-Origin" "<domain here>";
    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
    proxy_set_header  X-Forwarded-Proto $scheme;
    proxy_set_header  Host $http_host;
    proxy_pass http://discourse;
  }

}

I wanted to sort of take this chance to include a more advanced config, that sort of explains my thinking behind the upstream block. It allows for load balancing to multiple docker installs, or generating custom errors while my docker instance is upgrading; or serving of assets while the back end is missing.

upstream discourse {
  server localhost:8020;
  server localhost:5678 backup;
}

server {
 listen 5678;
 location / {
   root /usr/share/nginx/html;
   # <or a path to HTML files to use... or maybe more?>
  }
}
... see above ...

(Lee_Ars) #6

Based on my read of the docs, $host vs $http_host shouldn’t make much of a difference, but who knows? Will try and see.

A predefined upstream block also shouldn’t make a difference, because that’s supposed to be used to define a group of upstream servers for load balancing (though nothing’s wrong with using it to contain a single host).

Still, it’s probably best practice to shift from a direct proxy_pass call to an upstream variable—it’s certainly less cluttered visually and abstracts definition from function.

Edited to add—

Hm, okay, I think we may be good. Docker nginx’s log file still shows nothing but the physical server’s IP address, but Discourse’s production.log shows requests from Internet IP addresses…except for my requests, which all show as coming from 127.0.0.1. Which is wrong, because I’m not accessing the thing from localhost—I should be showing up with my LAN IP address. But, hey, everyone else appears to be showing up correctly.

Will observe in production over the next few days!


(Sam Saffron) #7

Awesome! did not have a chance to respond to this on the weekend, however I think its a good approach.

It allows us to keep the docker side of things tidy by moving SSL termination out of the container. For the record, we terminate SSL on haproxy.

Regarding IP Addresses, this little snippet should help explain what is going on:

https://github.com/rails/docrails/blob/master/actionpack/lib/action_dispatch/middleware/remote_ip.rb#L132-L152

There has been lots of questions about SSL in light of the new docker stuff, is there any chance you could write up a little howto here :slight_smile: I support this approach, think it is very clean. Perf will not be a concern.

On the topic of SSL this is a VERY interesting article by Ilya Grigorik Optimizing NGINX TLS Time To First Byte (TTTFB) - igvita.com


(Sam Saffron) #8

I was actually thinking about @ilya1 article and it may end up “full circling” us back to Docker, cause it would be “possibly” the only sane way to get an optimised nginx SSL config that we could widely deploy. Lots of steps are involved (patching nginx etc. etc.)


(Lee_Ars) #9

Apparently @ilya1 actually submitted a pull to nginx, and as of 1.5.9 there’s a ssl_buffer_size option. And an interesting discussion on potentially making the buffer dynamic.

I’m on 1.5.10 (because of the hawt, hawt SPDY 3.1 support!), so I’ll crank the ssl_buffer_size down to 4k. It sounds like there might be a substantial upside for time-to-first-byte performance.

(a moment later) OK, change made to 4k—shift-refresh the crap out of the site (or out of my static homepage or blog) to see the difference. If there was any. Can’t really tell, but the web server’s in my closet and I hit it over the LAN so everything’s always fast for me :smile:

I’ll see what I can do :smiley: However, if we’re comfortable with double-reverse-proxying nginx, then a single canned virtual host file for the top layer of nginx would probably be best. Nginx is a hell of a good ssl terminating proxy if you already have it deployed—in fact, in this config, it’s “free”.

In instances where your Discourse container is going to be directly exposed to the web and you don’t need haproxy for load balancing, then maybe a fast and bare-bones terminator like stud is the way to go (and it’s in the Ubuntu repos w/a ppa). I swear that there’s another ssl terminator i’ve looked at semi-recently, but the name is escaping me. It might have been stud, but I can’t remember.

Lemme look at this later tonight.

edit - you might consider flipping your in-container nginx from stable to mainline, too—I wouldn’t think there’d be any downsides. Nginx’s mainline branch is definitely prod-ready, and you’ll get a number of performance increases & fixes.

Maybe you could have the option to pull it in-container—the sticky part would be the cert & key, which would have to be put into the shared directory for the container. If I’m understanding docker correctly, you’d also have to give your docker-running account read access to the keyfile, which is definitely not a best practices kind of thing. Still, other than providing the user a place to plug in cert & key, it seems doable. And I’ve been saying all along that terminating SSL at the nginx level makes more sense than trying to do it at the app level (assuming you can make your app appropriately protocol-agnostic).


(Ilya Grigorik) #10

FWIW, here’s the config I’m using / recommend:
https://github.com/igrigorik/istlsfastyet.com/blob/master/nginx.conf

For additional details, see: https://istlsfastyet.com/


Advanced Setup Only: Allowing SSL / HTTPS for your Discourse Docker setup
(Sam Saffron) #11

@Lee_Ars

I created a simple template in case you want SSL inside the container:


(Sam Saffron) #13

@igrigorik I recently got this PR that heavily amends your recommended template.

What are your thoughts on it?


(Ilya Grigorik) #14

I would suggest not trying to reinvent the security wheel… If in doubt, defer to Mozilla’s sample:
https://wiki.mozilla.org/Security/Server_Side_TLS#Nginx

  • Use Mozilla’s cipher list - just copy / paste it, and keep it updated.
  • Adjust the record size (ssl_buffer_size 4k)
  • Please enable TLS tickets and bump their lifetime to 12-24hrs
  • Setup a cron to HUP nginx process every 12-24 hrs to regenerate random used by tickets.

More @ Making HTTPS Fast(er) - nginx.conf - Google Slides


(Tsu) #15

I assume this advice is based on experience with very old OpenSSL versions. More recent ones (1.0.1 and later) order ciphers by strength.

Copying a list with fixed ciphers will result in:

  • Outdated ones still being used despite them having been demoted (like RC4) if you forget to amend the list.
  • A change of the TLS implementation which introduces new handshakes and ciphers will result in errors (“cipher not found”), degraded security (when the first ciphers get silently dropped and the trailing fallbacks are used) as well as omission of new ciphers (CHACHA20, NORX, AEGIS…).

Using classes (I don’t use the term “cipher groups” here as not to confuse anyone who has used BoringSSL) results in the TLS implementation (= its authors) selecting the right ones. For example, ECDH+HIGH would include CHACHA20 as well as AES.


I recommend going with the classes as proposed, just using EECDH+… and skipping RC4 as well as “!aNULL:!eNULL” because that’s already ruled out by HIGH and EECDH+…:EDH+….

The curve Nginx selects by default (P256) is suitable for up to AES128, hence demoting (or even skipping) AES256 is reasonable. As to -3DES:-CAMELLIA: I guess both will be used primarily by bots and crawlers, so removing them decreases any remaining attack surface.

Without setting ssl_dhparam Nginx will use a number within 1024b. Which is below the recommended threshold of 2048b (ENISA: 3072b!).

This is not related to the aforementioned changeset.

A IP packet with TCP and TLS has a payload of about 1360–1400 bytes. The IW on most kernels in use is 10. So I would recommend to go with a buffer size (the larger the more payload!) that equals 2 or 4 packets:

  1. The MTU for DUL networks in most countries in the EU is 1480 (not the 1500 most living in LANs expect). (Interestingly FDDI has 4470, although 4352 is the max according to standards.)
    That is: 1480 - 40 for IPv6 - 20 for TCP - 8 for TCP options (scaling and sack; you could enable more for up to 20 byte) = 1412 PDU. (1432 with IPv4).
  2. A TLS record will eat 5 bytes and 32 bytes for the MAC (SHA256 with TLSv1.2; SHA384 would need 48 bytes).
  3. Padding is applied unless you send data in the cipher’s blocksize. It’s 16 byte for AES.
  4. Padding is applied … hash’s blocksize.
  5. Payload : N → [(N * 1412 - 37) / 16] * 16
  6. Notice: 1 → 1360, 2 → 2784, 3 → 4192 ≠ 4096, 4 → 5600

(Tsu) #16

By the way, please be aware that the way Docker setups containers limits nginx’s performance (think: to something in the order of 7’000 connections/s).

For example, net.core.somaxconn = 128 is fixed and you cannot use F_NOCACHE (aio → directio; to not thrash the VFS cache with a few large downloads) and others — unless you start the container with --privileged:smile:


(Sam Saffron) #17

Can you put through a PR with your recommendations here?

Regarding somaxconn, I am not sure the general population of Discourse users are ever going to hit the backlog there.


(Kane York) #18

@tsu You’ll want to edit these files: discourse_docker/web.template.yml at master · discourse/discourse_docker · GitHub

Also, could you include the “sigHUP to nginx” that @igrigorik mentioned?


(Tsu) #19

I’ve made non trivial contributions to Nginx in the past which have been accepted. Much to my chagrin the lines marking me as their author have been stripped and replaced by © Sysoev. Therefore I don’t contribute anymore anything back to the “mainstream” and publish patches separately.

I really would. But the lack of files CONTRIBUTORS and/or AUTHORS is holding me back. Please see how, for example, Golang deals with that.

(I didn’t notice any running cronjob in @sam’s container, too.)

A single hotlinked image will do that. Or being featured at Heise.de (“slashdotting”).

Please merge #99 and I will go from there.

For your OpenSSL the cipher string would be: EECDH+HIGH:EDH+HIGH+TLSv1.2:-AES256:-3DES:-CAMELLIA:!RC4:!MD5
or shorter, because RC4 and MD5 are not included in HIGH anymore: EECDH+HIGH:EDH+HIGH+TLSv1.2:-AES256:-3DES

Something better will come up once we have reviewed Nginx&OpenSSL in @sam’s container. Stay with me (je .nginx_review):

Humbly, no. Sorry

That data structure needs to be replaced by a proper LRU or LFU cache, with an additional time-based expiry (see how Redis does this). Nginx brings already almost everything we need for this and it is easy enough to implement that undergrad CS students are tasked with it. :wink: (I will provide a patch to Nginx for you if you like.)

EDIT: I’ve obviously mixed session tickets and session IDs. If it were up to me I would drop session IDs (which would need the LxU cache) like a hot potato because beginning with Windows 8 even SChannel supports tickets. And then use Nginx’ built-in session ticket rotation, only without the files. /EDIT

.nginx_review:

crypto, ssl, z and the lot needed for gLibC and Linux, I get that. But I don’t understand why Discourse needs all of this in Nginx—and please don’t say “Ubuntu pulled that in”:

$ LD_TRACE_LOADED_OBJECTS=1 /lib64/ld-linux-x86-64.so.2 $(which nginx) | sed -e 's:=>.*::g' -e 's: (.*::g' | tr --delete "\t" | grep -v -e 'ld-linux' | sort | awk '{ printf "%-20s", $0 } (NR % 4 == 0) { print "" } END { print "\n" }'
libaudit.so.1       libcrypto.so.1.0.0  libcrypt.so.1       libc.so.6
libdl.so.2          libexpat.so.1       libexslt.so.0       libfontconfig.so.1
libfreetype.so.6    libgcrypt.so.11     libgd.so.3          libGeoIP.so.1
libgpg-error.so.0   libjbig.so.0        libjpeg.so.8        liblzma.so.5
libm.so.6           libpam.so.0         libpcre.so.3        libpng12.so.0
libpthread.so.0     libssl.so.1.0.0     libtiff.so.5        libvpx.so.1
libX11.so.6         libXau.so.6         libxcb.so.1         libXdmcp.so.6
libxml2.so.2        libXpm.so.4         libxslt.so.1        libz.so.1
linux-vdso.so.1

If you accept that we need to replace that Nginx I will happily open a ticket for us to discuss how to move that forward (or just email me). :five:


(Sam Saffron) #20

nahh NGINX pulled it in :slight_smile: totally fine to change the container Dockerfile to build from source something with a much lower surface area

fine to add a CONTRIBUTORS file

will look at hand merging the SSL change, keep in mind at Discourse we do all SSL termination at HAProxy, the SSL template is purely a service to the community we need someone who uses this and loves this to help with maintenance on this piece.


(Discourse.PRO) #22

See my solution: How do you set net.core.somaxconn in the docker container?