Avatar Offloading to S3

I use AWS for hosting a Discsourse along with S3 as image store and CloudFront as CDN. Images are uploaded to S3 bucket, and the users accessed it via CloudFront edges for faster transfer.

Meanwhile, in April 2016, S3 released a feature called Amazon S3 Transfer Acceleration, which is simply to provide CloudFront enabled S3 endpoint by clicking the checkbox on bucket option.


https://aws.amazon.com/about-aws/whats-new/2016/04/transfer-files-into-amazon-s3-up-to-300-percent-faster/

After a few testing, I decided to move S3 from Tokyo region to Seoul region for better latency (my users are based in Korea) and to use S3 acceleration. I was able to use S3 on Seoul and its S3 acceleration endpoint as a CDN url; and finally I did rebake posts via rake posts command. The post images were well migrated to new S3 location, but turned out the users’ avatar image was hosted on web server and all of them are be broken. (That is okay for now as I asked users to upload avatar again)

The question is, how can I offload the avatar images to S3?

  • I run it on EC2 instances with auto scaling group, which runs 1 server at midnight and runs 4 servers at daytime. If the avatar images be uploaded to/served from S3, I do not need to worry about image loss. If it is be served from EC2 host, a user may upload the avatar to one of 4 EC2 instance, but chances are it may not be served from other EC2 because they don’t have it.)
  • The S3 transfer accelerator is served via ClodFront CDN edges which provides faster speed. so I can leverage CDN for avatar images without configuring CDN by my own.
  • S3 data is redundantly stored across multiple facilities and is designed for durability of 99.999999999% of objects. In other words, I rather trust S3 for my data durability over hosting them on my EC2 instances which is managed by me who I do not trust at all.
1 Like

Even though avatar originals are stored on S3 there is still a bunch of logic that needs to happen app side to decide on which size to serve and so on.

Meta uses S3 for uploads and avatars are served from CDNs, we do that by defining the cdn url.

Notice how your avatar is served from the CDN here?

https://d11a6trkgmumsb.cloudfront.net/user_avatar/meta.discourse.org/jongnam/90/52330_1.png
2 Likes

I also use both S3 (for uploads) and CloudFront CDN (for avatars) so the avatars are served from my app servers via CDN edge. But my interests are (1) data durability and (2) server elasticity while minimizing infrastructure complexity.

Regarding avatar image’s durability, which would be easier getting 11 9s of data durability: running 3 servers in different facilities (while ensuring disaster blast radius) or using S3 which does it automatically under the hood.

Server elasticity means two aspects: server be added when traffic flows in, and be deleted when traffic is low. That is what Auto Scaling does. But that also means servers (EC2 instances) would be launched and terminated on demand. Auto Scaling gives almost unlimited horizontal scalability while keeping the infrastructure cost low.

CDN would help reducing network latency for the static objects, but it does not help on dynamic contents’s scalability. And yet the static files need durability which CDN cannot help: the CDN edges have copies of origin contents, the edge copies however will be gone if origin content is gone. Data durability is about origin server cluster design and CDN does not solve that problem.

If the server is yet better place to host avatar images, it needs clustering mechanism to share them across the server cluster. When a user uploads an avatar file, the file should be evenly stored to every app server nodes so that the file be served from any app server. Perhaps SAN storage would help, but it brings another complexity of infrastructure which is again hard to manage.

If the static files such as avatar images be stored into S3, then Discourse would be more friendly to auto-scalable infrastructure.

2 Likes

We have been through this, and its complicated, complicated enough that a CDN is a fine solution and in practice scales us just fine.

Say I have an avatar:

sam.png

Now the client happens to need a copy of sam.png at 120x120px. How does it figure out how to get that?

Option 1:

Whenever server ships down avatar info to the client it ships an array of all the possible sizes of avatar the client may need eg:

{'sam.png' => [
   [10, 'http://somewhere/png1.png'],
   [20, 'http://somewhere/png2.png'],
   [40, 'http://somewhere/png3.png'],
   [50, 'http://somewhere/png4.png'],
   [60, 'http://somewhere/png5.png'],
   [70, 'http://somewhere/png6.png'],
   [80, 'http://somewhere/png7.png'],
   [90, 'http://somewhere/png8.png']
 ]
}

There are tons of locations cause with have retina vs. non retina and thus numerous device pixel ratios and multiple sizes of avatars the client may need.

Options 2:

We send a template

{ 'sam.png' => 'https://cdn.discourse.org/avatars/sam.{size}.png'}

We went with option 2, because option 1 is where madness lies. Option 2 scales super fine as our NGINX config will properly cache every avatar, and CDNS cache every avatar. So first hit warms up the CDN cache and second hit is served directly from CDN without going to the app.

3 Likes

Thanks for the note, I could image how complex it would be to serve multiple resolutions of avatar. It would much simpler to service via NGINX as you explained so that NGINX can generate thumbnails on the fly while the CDN caches the file to the edge. The combination of NGIX thumbnail generation and CDN will be the easy way to implement and I cannot agree more at this moment. But yet it does not answer the data durability question. If the NGINX server is the first tier storage of avatar, there is still a risk of data loss which happened to my application in a week back.

What if NGIX stores the avatar original images into S3 and fetch the files on demand? If the NGINX stores and fetches avatar original images from S3, then Discourse can keep using the existing avatar implementation whilst ensuring the data durability. For example, it can still use cdn.discourse.org as the endpoint and pointing to NGINX as origin. In case that the NGINX was terminated or lost avatar thumbnails, it can fetch the original images from S3 on demand and does not need to worry about data loss.

If you have S3 setup, all original are on S3. We simply have the app act as a proxy (resizing service) for the case of avatars.

2 Likes

Thanks again; it is interesting then; because after changing S3 bucket from one to another, the GET request regarding avatars started sending error codes; not sure it was 500 or 404. I copied bucket content by using “aws s3 sync” command so the avatar original images were on new S3 bucket as well. That was why I suspected that avatars be stored to app server.

Please disregard this feature request. But it may worth to investigate what happens on avatar generation logic when S3 bucket location is be changed.