Varnish and Discourse


(Sam Nazarko) #1

Hi

I recently posted about experiencing slow responses with Discourse despite us having adequate hardware and network resources. I really love Discourse and how much it helps our users, so I thought I’d update @codinghorror and @sam to let them know about the issues.

One of our guys (@marktheis managed to pinpoint the issue). Discourse is apparently getting 500s from ‘message-bus’. My webdev knowledge is quite limited, but I did some digging. It seems Discourse implements its own publish/subscribe mechanism rather than using WebSockets. I can understand WebSockets are problematic. My patches to Varnish fixed an issue we had with them (as we use Libreboard internally), but it never occurred to me that Discourse would have its own implementation.

Unfortunately, Varnish 4.x is not respecting the fact that POST is not idempotent (at least here it is not respecting that). By default, it will reorder and cache any POSTs… The reason Discourse was working at all, seems to be due to the fact that your mechanism implements a retry method. This consumes a very large amount of sockets for us – I can see it from the kernel in dmesg, but heck, it worked, and that probably is an improvement over WebSockets.

With that said – my understanding is Sam’s custom implementation exists because WebSocket’s ‘edge’ cases aren’t often edge cases at all. I think running Discourse behind a load balancer isn’t either, and under Varnish certainly isn’t. Varnish 4.0 is default in Jessie and Ubuntu 15.04. If you are to use Varnish, here is what I did to fix things:

if (req.http.host ~ "discourse.osmc.tv") {
        set req.backend_hint = discourse;
        return (pass);
}

I’ll be refining it so we only pass on cache hits for message bus in the future. But I must ask, is this forum (meta) running under a load balancer? Can you describe your stack for this? I’d like to be on a stack that is not so susceptible to breakage!

Cheers, and apologies guys, as Discourse remains robust as ever, but I seem to find ways to break it (be it using Pound, or Varnish!).

Sam


(Sam Saffron) #2

We use haproxy which handles long lived connections fine (and really is more of a proxy than a cache)

For https://community.fastly.com/ we use Varnish direct, but bypass it for all message bus stuff setting the site setting “long polling base url” to bypass Varnish altogether.