Troubleshooting "BAD CSRF" error on initial site setup?

Standing up a new Discourse forum as a separate docker container on a dedicated physical server that is already running an existing Discourse forum (which is humming along with no errors).

Bootstrapping works without errors and the initial “You’ve successfully installed Discourse!” message appears without issue, and I can enter my desired username & PW without any error. However, on submitting the form, rather than sending the initial registration email, Discourse displays a white page with ["BAD CSRF"] in the upper left corner.

I’m honestly not even sure where to start troubleshooting this. Searching here on meta didn’t really yield any results that appear relevant.

Suggestions on where to start looking?

It’s likely a problem with whatever is doing https. How’s that configured?

The web server hosts 4 domains and seven separate web sites, so universal SSL termination is done by HAProxy so I can keep the layers separated and provide caching even to SSL content.

The stack is HAProxy → Varnish (cache) → nginx reverse proxy → Discourse.

Worth noting that I had no issues setting up the first Discourse forum under this same configuration.

Edited to add - client connections are over https, but I’m proxying from nginx to the docker container’s HTTP port, not HTTPS (again, doing what works for the first Discourse instance). I can try changing that to the HTTPS port to see what happens, though, if that’ll help.

edit^2 - no, that didn’t help.

Looking through the Discourse production.log and this is what I see:

Started POST "/finish-installation/register" for 2601:2c4:c700:745f:216:3eff:0:11 at 2018-09-12 19:21:26 +0000
Processing by FinishInstallationController#register as HTML
  Parameters: {"utf8"=>"✓", "authenticity_token"=>"[redacted]", "email"=>"redacted", "username"=>"redacted", "password"=>"[FILTERED]", "commit"=>"Register"}
Can't verify CSRF token authenticity.
  Rendering text template
  Rendered text template (0.0ms)
Filter chain halted as :verify_authenticity_token rendered or redirected
Completed 403 Forbidden in 2ms (Views: 0.3ms | ActiveRecord: 0.0ms)

Still looking through the other threads on meta where Can't verify CSRF token authenticity has come up.
I also see the 403 response in Chrome’s console:

Request URL: https://(forum url)/finish-installation/register
Request Method: POST
Status Code: 403 
Remote Address: [2607:fad0:3524:1::8]:443
Referrer Policy: strict-origin-when-cross-origin

This happens when SSL is badly configured. Most of the times a header is missing from the reverse proxy config.

2 Likes

^ this - the host header has to remain intact throughout, otherwise encryption can’t be established. Is it all IP/port based behind HAProxy?

Out of curiosity why are you using HAProxy in front of Varnish and then nginx behind?

Right, and my first guess was to make sure X-Forwarded-Proto was being appended properly by the reverse proxy—and it is. That’s the annoying thing here—the configuration between the working forum and the new one is identical.

And when I say “identical” I mean literally using the exact same processes and config files :smiley: They’re both on the same server, so other than the nginx configuration file in sites-available they’re even using the exact same set of configuration files. Everything’s the exact same.

The nginx config is pretty short—hard for me to screw that up:

server {
	server_name [redacted];
	listen 8881;
	listen 8882 http2;

	sendfile on;

	location / {
		access_log off;
		proxy_set_header X-Real-IP $remote_addr;
		proxy_set_header Host $http_host;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
		proxy_set_header X-Forwarded-Proto https;
		proxy_redirect off;
		proxy_pass http://localhost:7996;
	}
}

The only differences between it and the config for the working forum is the server_name directive and the port I’m passing to.

Is HAProxy using a SAN cert, or unique IPs and separate certificates?

The overall config goals were to 1) encrypt everything and 2) cache everything. These are obviously opposing goals, so the way I went about it was to stratify things: SSL termination first, then a cache layer, then a web server that both serves static stuff and also functions as a reverse proxy as needed (for wordpress, discourse, and a few other things).

I initially had a fair amount of trouble with the “nginx sandwich” approach (nginx → varnish → nginx)—getting two separate instances of nginx to work properly with Upstart on ubuntu 14.04 proved to be very difficult and required a lot of screwing around, so I ditched nginx as my ssl termination layer and went with haproxy instead. If I were redoing this now, I’d go with Hitch, but ripping out haproxy at this point would require some research on how to do the transition.

edit:

HAProxy is using separate LetsEncrypt certificates (maintained via acme.sh), one per host. This is done mainly because the number of sites being hosted has changed over time and changing/updating a single SAN certificate proved to be kind of a pain in the ass. Additionally, I have a couple of tenant sites that would prefer to keep their SSL configs as separate from mine as possible.

Fair enough, they’re just separate sites though in the same instance of NGINX, it’s quite a common setup. Due to the nature of the app Discourse doesn’t really respond well or need external caching, HAProxy is only going to give you some port-redirection-fu there, which Nginx also covers.

Did you enable force_https in the second site?

Is there an easy way to do that via config file editing? I can’t log into the new site yet—I can’t get past the initial admin user registration step.

Yeah, I’m aware of Discourse’s cache behavior—this config has been live for a number of years. Discourse is not the only tenant application on the box, though, so its requirements get added into the mix along with everything else’s, and pretty much everything else on the box is very cache-friendly.

I honestly hadn’t thought about doing this all with a single instance of nginx. That’s definitely an interesting suggestion, though I’d need to sit down and whiteboard out the flow. Initial connections on port 443 (or 80 redirected to 443), proxying to varnish, proxying from there to nginx on a different port, I suppose, though I’m wary to rely on a single application for all three layers here. Feels like isolating errors and fixing them becomes considerably more complex.

(I’m aware nginx has serviceable cache, but it lacks varnish’s rich purge/ban functionality and makes manual object invalidation into a giant pain.)

This should do it:

./launcher enter app
cd /var/www/discourse
rails c
SiteSetting.force_https = true
3 Likes

No joy - set to true:

[1] pry(main)> SiteSetting.force_https
=> true
[2] pry(main)> 

Stopped & restarted the docker container just to be sure (not sure if that’s necessary or not but figured it couldn’t hurt), but still receiving the same error.

So your setup is:

HAProxy → Varnish (cache) → nginx reverse proxy → Docker

And SSL termination happens at HAProxy, right? Is the HAProxy config the same for both sites? With same header injection?

Exact same—traffic for both sites is going through the same haproxy frontend and same backend. Not doing any header injection with haproxy—in fact, I’m using HAproxy in TCP mode so that I can pass traffic via proxy-protocol-v2 to varnish, which lets me offer full HTTP/2 from the nginx reverse proxy at the bottom of the stack.

I do response header injection via varnish (hsts, referrer-policy, x-frame-options, x-content-type-options, a few others), and request header injection (like x-forwarded-protocol) with nginx.

edit - I totally understand the limits of free support—if this is going to turn complicated, I’ll bang on it a while on my own this weekend when I’ve got some spare time (and more importantly, some latitude to break things a bit). My hope was that this was going to be something really simple—and it still might be!—but I don’t want to be a pain.

1 Like

For anyone else banging their head facing this issue. Please read below.

I was having similar issue. In my case we are behind Cloudflare, then Nginx and the setup was for a forum on subfolder.

Ultimately following combination worked.

  • Disabling caching for the subfolder on Cloudflare
  • Following Nginx block
    location /folder {
        proxy_ssl_server_name       on;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        proxy_pass      http://localhost:1357/folder;
    }
2 Likes

Glad you got it! Was the test site also behind cloudflare?

1 Like

The other site was also on cloudflare but didn’t have caching enabled. The new site root has aggressive caching enabled via page rules, which would also apply to subfolders, hence the issue.

Browser Cache TTL: a month, Always Online: On, Cache Level: Cache Everything, Edge Cache TTL: 2419200 seconds

I also think following header is also critical.

        proxy_set_header X-Forwarded-Proto https;
3 Likes