Maximizing CDN hit/ cache rate for Discourse?


#1

Hi everyone,

I have the following CDN set up:

  • a pullzone for the JS
  • a pullzone for the logos, homepage graphics (that are frequently accessed)
  • a pullzone for the S3 assets (on Digital Ocean Spaces)

I use BunnyCDN. My forum traffic is about 85% North America, 5% UK/ FR and 10% SE Asia.

For the first two pullzones, I use all of the CDN’s 34 edge servers around the globe because those are frequently accessed files so I can keep the latency low while still keeping the cache/hit rate (92% and 99.8% respectively) relatively high. Ideally I’d like to get the cache rate higher for the JS but the challenge is the low traffic data centers have bad cache rate because of how infrequent those JS get accessed (a few times a month).

For the third pullzone of S3 assets, I use 10 edge servers around North America and Europe because user generated S3 assets are sometimes accessed infrequently so I don’t prefer to have a very high number of CDNs dragging down the hit/cache rate, and by extension, the access time. down. Ideally I’d like to have a edge server in SE Asia, but the CDN doesn’t let me cherry pick which data center I can use. My hit/ cache rate is about 78% at BunnyCDN, which I thought was decent because e.g. someone accessing a full sized original picture instead of just looking at the optimized, or a search engine referral to an old, infrequently-accessed discussion, will most definitely drag the % down. I used to use Cloudfront, and the hit/ cache rate was about 55%, but that could be a function of Cloudfront having many edge servers locations for its CDN, or my forum traffic is relatively small for them. (I moved away from Cloudfront because of cost, since we are a hobbyist forum that has minimal revenue.)

Question for the group is, do you have any strategies/ methods that allow you to keep the hit/ cache rate high? What kind of hit/ cache rate are you getting?

Any suggestion for me to tune my set up to increase the %? Are there any budget CDNs that allow cherrypicking which edge servers locations to use? Can I do that via edge rules? If so, I can just pick use five to keep the cache rate high- the US west coast, US east coast, US south, UK, Singapore, since that’s where my traffic is concentrated.

One idea is keep optimized assets served from S3 and original assets served from DO spaces, but the separation is not feasible from the software by default.

Any other thoughts?


(Jeff Atwood) #2

I see, so your idea is to reduce the number of geographical points to increase hit rate? Because if you have “one” location aka everything on one server, that’s a perfect hit rate. :wink:

I would imagine this comes down to knowing your specific audience, and where they are – so metrics would need to be gathered first about which geographical CDN points are being hit, then consolidate to the most used?


(Andrew Schleifer) #3

If you want a higher hit rate then your choices are 1) fewer PoPs or 2) longer retention. The first choice is going to make the experience worse for some clients, the second is going to cost more money (if it’s even available).

This is exactly what you need to find out. What are your misses and where are they coming from?


#4

The 5 PoPs closest to the majority of our traffic are averaging about low 80s% hit cache rate, whereas the other ones with more sporadic traffic- the lower the traffic is, the lower the hit rate, sometimes below 50%. That’s when I think consolidating the PoPs can bring the hit rate up so the CDN doesn’t always have to go back to the origin to fetch, which, speedwise is worse than just serving from the origin. Its a tradeoff between additional latency with PoPs located farther away, and reduced latency from increased cache rate at the PoP.

Longer retention is tougher to solve. That’s the lever that can bring up the hit rate for the high traffic PoPs, and I don’t necessarily have a solution for that, yet.

I am curious about others’ experience- Does high-70s% and low-80s% cache rate for user-uploaded assets feel low/ about right?


(Jeff Atwood) #5

It really depends how often the Discourse instance is updated/deployed though. For us, we deploy a lot so that colors the data significantly.