Cache headers confuse proxies


(Łukasz Piestrzeniewicz) #1

I’m trying to deploy Discourse on Shelly Cloud (shellycloud.com). Shelly by default provides Varnish as caching reverse proxy. Other projects that I host there use the same setup and work flawlessly.

Discourse is problematic. Most notably categories and list controllers send cache control headers when accessed anonymously. What happens is that this ‘poisons’ cache:

  • Anonymous user accesses Discourse front page
  • List controller sees that no user is logged in and sends response with Cache-control: max-age=60, public
  • Varnish helpfully keeps this response in cache. Cache is ‘poisoned’.
  • User logs in and is redirected to front page
  • Varnish sees that page was cached serves it from memory. It never goes to Discourse as it was instructed not to.
  • User sees front page designated for anonymous user.

In my opinion Discourse (nor any other app for that matter) should not have two different caching policies for a single URL. URL should be either cacheable or not. Proxies have no way of knowing that upstream may have this kind of policy.

Using Vary: Cookie header is not a solution either. Although it will instruct proxy to send cached version for users without cookie (i.e. anonymous) it will also tell to cache each version of page based on cookies. Since each logged in user has different cookies this will quickly fill the cache. Proxy will have to then evict other responses to keep hundreds of copies of home page.

Maybe Discourse could use two namespaces: one for anonymous users that can send caching headers and another one for logged in users. Responses that don’t depend on user being logged in can be made cacheable while responses specific for user should be not.


(Sam Saffron) #2

Having this page be served to anon as

http://meta.discourse.org/t/cache-headers-confuse-proxies/8815

and logged on users as

http://meta.discourse.org/tl/cache-headers-confuse-proxies/8815

is a complete non-starter. We need differing cache policies depending on the login state. Open to creative solutions, but we can not change URLs. For example @supermathie and @eviltrout wrangled this to work with nginx.


(Łukasz Piestrzeniewicz) #3

Another solution is to serve (and cache!) identical content to anonymous and logged in users.

As far as I can see those two views differ in some details only:

  • hidden login form for anonymous users (can be easily sent to logged in users as well)
  • currentUser information (may be fetched in separate request from non-cached URL)
  • csrf token (may not be cached under any circumstances, requires separate request)
  • GA tracking requests

Maybe HTML should not include this information and instead have a link to not cacheable /user-info.js. As this would be loaded at roughly the same time as HTML no re-draw should occur.

One more smart idea (I’m not found of those too much actually) is to have per-user CSS file that displays avatar for example. But this might be an overkill.


(Robin Ward) #4

Isn’t it possible to make a varnish script to get around this? As Sam mentioned we got this working quite well in nginx like this:

set $identified "";
if ($http_cookie ~ "_forum_session") {
    set $identified "1";
}
add_header X-Discourse-Identified $identified;

proxy_set_header  X-Real-IP  $remote_addr;
proxy_set_header  X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header Host $http_host;
proxy_ignore_headers Set-Cookie;
proxy_cache_bypass $identified;
proxy_no_cache $identified;
proxy_cache staticfilecache;
proxy_cache_key "$scheme$http_host$proxy_host$request_uri";

The idea here is to disable caching if the user has a _forum_session cookie. We also build a cache key based on the URL only, not the cookie. Isn’t something like this possible in varnish?


(Łukasz Piestrzeniewicz) #5

Sadly in this environment user has no control over Varnish nor Nginx configuration. It also feels to me like a cludgy workaround. I would prefer to actually fix Discourse to work properly instead.

My working plan right now is to:

  • disable by-hand caching
  • add caching to all requests in after_filter
  • actually send caching headers only if current_user was not used in current request
  • rewrite heavy actions to be independent of current_user, move user dependent parts into separate requests.

Changes are available on Github (https://github.com/shellycloud/discourse) and are deployed on http://discourse-proxy.shellyapp.com/.


(Lee_Ars) #6

FWIW, I’ve had success with Discourse and Varnish by keeping things extremely simple, using Varnish only to cache static assets and leaving the rest of the forum content untouched. My Discourse statement in sub vcl_recv is very short:

 # Cache only the static assets in Discourse's "assets" dir and pass everything else
 if (req.http.host ~"discourse.bigdinosaur.org") {                        
         if (!(req.url ~ "^/assets/")) {                                  
                 return (pass);                                           
         }                                                                
 } 

This avoids having to screw with cookies or alter the web server configuration, though it leaves plenty of cacheable things on the table (js files, for example, though including those should be pretty easy). Using Varnish with dynamic-by-nature applications like web forums that don’t have native support for caching solutions is always tricky, as you obviously already know!

However, on re-reading before I hit “reply,” I see your note that the user has no control over Varnish or Nginx. That obviously complicates things.

Definitely interested in whatever you come up with to make Discourse more Varnish-friendly. I don’t have very much experience in scaling Ruby-based web apps, and anything Varnish can do to lighten the load is a good thing. However, Discourse has enough moving dynamic parts that I’m not sure the effort will be worth the reward.


(Jeff Atwood) #7

We have made a number of caching changes since then, is this resolved now with latest?


(Łukasz Piestrzeniewicz) #8

In fact it does, thank you! Thanks to those changes we were able to deploy Discourse on Shelly Cloud and provide a free hosting for polish Ruby on Rails community. It’s available on https://forum.rubyonrails.pl/.


(Jeff Atwood) #9