初期サイト設定時の「BAD CSRF」エラーのトラブルシューティング?

既存の Discourse フォーラム(エラーなく順調に稼働中)が既に動作している専用物理サーバー上に、別の Docker コンテナとして新しい Discourse フォーラムを立ち上げました。

ブートストラップはエラーなく完了し、「Discourse のインストールに成功しました!」という初期メッセージも問題なく表示され、希望するユーザー名とパスワードの入力もエラーなく行えました。しかし、フォームを送信すると、初期登録メールが送信される代わりに、左上に [\"BAD CSRF\"] と表示された白いページが表示されてしまいます。

トラブルシューティングの着手点が全く見当たりません。Meta サイトで検索しても、関連する結果は得られませんでした。

どこから手を付けるべきか、ご提案いただけますでしょうか?

It’s likely a problem with whatever is doing https. How’s that configured?

The web server hosts 4 domains and seven separate web sites, so universal SSL termination is done by HAProxy so I can keep the layers separated and provide caching even to SSL content.

The stack is HAProxy → Varnish (cache) → nginx reverse proxy → Discourse.

Worth noting that I had no issues setting up the first Discourse forum under this same configuration.

Edited to add - client connections are over https, but I’m proxying from nginx to the docker container’s HTTP port, not HTTPS (again, doing what works for the first Discourse instance). I can try changing that to the HTTPS port to see what happens, though, if that’ll help.

edit^2 - no, that didn’t help.

Looking through the Discourse production.log and this is what I see:

Started POST "/finish-installation/register" for 2601:2c4:c700:745f:216:3eff:0:11 at 2018-09-12 19:21:26 +0000
Processing by FinishInstallationController#register as HTML
  Parameters: {"utf8"=>"✓", "authenticity_token"=>"[redacted]", "email"=>"redacted", "username"=>"redacted", "password"=>"[FILTERED]", "commit"=>"Register"}
Can't verify CSRF token authenticity.
  Rendering text template
  Rendered text template (0.0ms)
Filter chain halted as :verify_authenticity_token rendered or redirected
Completed 403 Forbidden in 2ms (Views: 0.3ms | ActiveRecord: 0.0ms)

Still looking through the other threads on meta where Can't verify CSRF token authenticity has come up.
I also see the 403 response in Chrome’s console:

Request URL: https://(forum url)/finish-installation/register
Request Method: POST
Status Code: 403 
Remote Address: [2607:fad0:3524:1::8]:443
Referrer Policy: strict-origin-when-cross-origin

This happens when SSL is badly configured. Most of the times a header is missing from the reverse proxy config.

「いいね!」 2

^ this - the host header has to remain intact throughout, otherwise encryption can’t be established. Is it all IP/port based behind HAProxy?

Out of curiosity why are you using HAProxy in front of Varnish and then nginx behind?

Right, and my first guess was to make sure X-Forwarded-Proto was being appended properly by the reverse proxy—and it is. That’s the annoying thing here—the configuration between the working forum and the new one is identical.

And when I say “identical” I mean literally using the exact same processes and config files :smiley: They’re both on the same server, so other than the nginx configuration file in sites-available they’re even using the exact same set of configuration files. Everything’s the exact same.

The nginx config is pretty short—hard for me to screw that up:

server {
	server_name [redacted];
	listen 8881;
	listen 8882 http2;

	sendfile on;

	location / {
		access_log off;
		proxy_set_header X-Real-IP $remote_addr;
		proxy_set_header Host $http_host;
		proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
		proxy_set_header X-Forwarded-Proto https;
		proxy_redirect off;
		proxy_pass http://localhost:7996;
	}
}

The only differences between it and the config for the working forum is the server_name directive and the port I’m passing to.

Is HAProxy using a SAN cert, or unique IPs and separate certificates?

The overall config goals were to 1) encrypt everything and 2) cache everything. These are obviously opposing goals, so the way I went about it was to stratify things: SSL termination first, then a cache layer, then a web server that both serves static stuff and also functions as a reverse proxy as needed (for wordpress, discourse, and a few other things).

I initially had a fair amount of trouble with the “nginx sandwich” approach (nginx → varnish → nginx)—getting two separate instances of nginx to work properly with Upstart on ubuntu 14.04 proved to be very difficult and required a lot of screwing around, so I ditched nginx as my ssl termination layer and went with haproxy instead. If I were redoing this now, I’d go with Hitch, but ripping out haproxy at this point would require some research on how to do the transition.

edit:

HAProxy is using separate LetsEncrypt certificates (maintained via acme.sh), one per host. This is done mainly because the number of sites being hosted has changed over time and changing/updating a single SAN certificate proved to be kind of a pain in the ass. Additionally, I have a couple of tenant sites that would prefer to keep their SSL configs as separate from mine as possible.

Fair enough, they’re just separate sites though in the same instance of NGINX, it’s quite a common setup. Due to the nature of the app Discourse doesn’t really respond well or need external caching, HAProxy is only going to give you some port-redirection-fu there, which Nginx also covers.

Did you enable force_https in the second site?

Is there an easy way to do that via config file editing? I can’t log into the new site yet—I can’t get past the initial admin user registration step.

Yeah, I’m aware of Discourse’s cache behavior—this config has been live for a number of years. Discourse is not the only tenant application on the box, though, so its requirements get added into the mix along with everything else’s, and pretty much everything else on the box is very cache-friendly.

I honestly hadn’t thought about doing this all with a single instance of nginx. That’s definitely an interesting suggestion, though I’d need to sit down and whiteboard out the flow. Initial connections on port 443 (or 80 redirected to 443), proxying to varnish, proxying from there to nginx on a different port, I suppose, though I’m wary to rely on a single application for all three layers here. Feels like isolating errors and fixing them becomes considerably more complex.

(I’m aware nginx has serviceable cache, but it lacks varnish’s rich purge/ban functionality and makes manual object invalidation into a giant pain.)

This should do it:

./launcher enter app
cd /var/www/discourse
rails c
SiteSetting.force_https = true
「いいね!」 3

No joy - set to true:

[1] pry(main)> SiteSetting.force_https
=> true
[2] pry(main)> 

Stopped & restarted the docker container just to be sure (not sure if that’s necessary or not but figured it couldn’t hurt), but still receiving the same error.

So your setup is:

HAProxy → Varnish (cache) → nginx reverse proxy → Docker

And SSL termination happens at HAProxy, right? Is the HAProxy config the same for both sites? With same header injection?

Exact same—traffic for both sites is going through the same haproxy frontend and same backend. Not doing any header injection with haproxy—in fact, I’m using HAproxy in TCP mode so that I can pass traffic via proxy-protocol-v2 to varnish, which lets me offer full HTTP/2 from the nginx reverse proxy at the bottom of the stack.

I do response header injection via varnish (hsts, referrer-policy, x-frame-options, x-content-type-options, a few others), and request header injection (like x-forwarded-protocol) with nginx.

edit - I totally understand the limits of free support—if this is going to turn complicated, I’ll bang on it a while on my own this weekend when I’ve got some spare time (and more importantly, some latitude to break things a bit). My hope was that this was going to be something really simple—and it still might be!—but I don’t want to be a pain.

「いいね!」 1

この問題に直面して頭を悩ませている他の人のために、以下をお読みください。

私も同様の問題に直面していました。私の場合は、Cloudflare の背後に Nginx が配置され、サブフォルダにフォーラムを構築する設定でした。

最終的に、以下の組み合わせで解決しました。

  • Cloudflare でサブフォルダのキャッシュを無効化
  • 以下の Nginx ブロックの設定
    location /folder {
        proxy_ssl_server_name       on;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        proxy_pass      http://localhost:1357/folder;
    }
「いいね!」 2

無事に解決できてよかったです!テストサイトも Cloudflare の裏にありましたか?

「いいね!」 1

別のサイトも Cloudflare を使用していましたが、キャッシュは有効になっていませんでした。新しいサイトのルートには、ページルールを通じて積極的なキャッシュが有効化されており、これはサブフォルダにも適用されるため、この問題が発生しました。

ブラウザキャッシュ TTL: 1 ヶ月、Always Online: オン、キャッシュレベル: すべてキャッシュ、エッジキャッシュ TTL: 2419200 秒

また、以下のヘッダーも重要です。

        proxy_set_header X-Forwarded-Proto https;
「いいね!」 3