ETag header support

I feel like ETag’s would have a large positive performance impact for page loads since most of the HTML pages are not cached. This would avoid the server from needing to serve the page if the client already has downloaded it.

Has there been any thought around this?

I might be wrong, but Discourse is already heavily dependent on client-side JS, so, what the client downloads is minimal data. Almost everything is loaded in the first visit, and then cached. I don’t know, really, how much ETags might improve that caching process.

For example, the first load of my page is ~800Kb, the second one is ~40Kb

2 Likes

Discourse is already quite well designed and set up for caching.

Most site assets (JS, CSS) have unique URLs that are generated each time you update the site with a hash of the asset so they can have long cache times.

I think site uploads (images, avatars etc) also use unique URLs.

Most of the full pages that you can see are dynamic and should not be aggressively cached. It would be possible I guess to have the kind of ETag caching where it checks every page load if there are no new or edited posts. I’m not sure why the team decided not to do this.

3 Likes

I should have clarified: the assets are indeed cached well – what I’m talking about is the HTML document (first request).

Most of the full pages that you can see are dynamic and should not be aggressively cached. It would be possible I guess to have the kind of ETag caching where it checks every page load if there are no new or edited posts. I’m not sure why the team decided not to do this.

Yes this is essentially what I’m talking about, but I don’t think ETag’s are generated by hand like that – they can be based off the raw html that is being served and can tell the client, “hey this is exactly what you saw before, just use that”

The thing is that, to my understanding, it already happens with the JS in the client side. So, you don’t have HTML going back and forth.

1 Like

The HTML loads JSON from the server, and that JSON request could use ETags. Currently it does not, though I’m not sure of the team’s argument for why.

1 Like

The first request to a page definitely has rendered content before it loads JSON from the server via XHR, which you’re right, is also happening.

You can verify this by looking at the “Document” type network request in Chrome Debugger, it should have (at least in my case) the categories already rendered.

Here’s an example of what’s rendered from the document request:

Your request is nonsensical because Discourse is a JavaScript app that does not retrieve HTML, all “pages” are built via executable JavaScript code in real time.

1 Like

Your request is nonsensical because Discourse is a JavaScript app that does not retrieve HTML…

I totally respect your experience and expertise here, but I’ve run dozens of javascript-rendered web applications that use ETags in the root response (if the content can be reused).

all “pages” are built via executable JavaScript code in real time.

The screenshot I posted above is the HTML that is returned before any clientside code runs, so there is certainly something on the backend (I’m assuming rails) serving this route.

Every single discourse community I’ve looked at (besides this one), initially returns a a javascript-less version of the site with all of the content rendered, presumably for crawlers.

Apologies if I’m way off here, but I don’t think I am being “non-sensical,” I may just be wrong.

1 Like

Only for crawler user agents, so this isn’t a useful observation.

Only for crawler user agents

That’s not what I see when I run this:

curl -H "User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" https://community.midi.city/

That is not a crawler user agent and it’s returning the above payload.


Regardless, I think the answer to my request for ETag’s is a “no,” so thanks for the feedback and maybe it will be reconsidered at some point.

Correct, the answer is a hard, definitive no for both philosophical and technical reasons.

(assets are a different issue, but using unique filenames with a guid is a far superior approach, so etags are kind of obsolete in general.)

Even for the API? I could understand for the smaller requests that it’s probably not worth it, but the topic views can be up to 20KB, which would add up. But then, not a lot of people are viewing topics repeatedly unless there are new posts…

That’s the point. For repeated views of the exact same content, if you are offline, we already render all the content from the browser cache without touching the server.

Upgrading that to load even if you are online involves cache invalidation, so naturally it’s hard.

2 Likes

Oh, good to hear that works. I would’ve thought that the cache-control: no-cache, no-store header meant the API responses would never enter the browser’s cache.

They don’t. Well, it’s complicated. There are multiple caches in play :sweat_smile:

It doesn’t enter the conventional browser cache everyone knows and love. But there is a Cache Web API exposed to browsers in JS which is used to cache responses in order to provide offline navigation of previously read content.

5 Likes