Step-by-step for moving large site uploads to S3?

Our site has close to 2 million posts. Our uploads directory is currently 18 GB in size. The site is active 24/7.

What would be a recommended step-by-step procedure for moving our uploads to S3?

Can we avoid downtime while doing this?

Thanks,
Gunnar

2 Likes

We recently went through this process, and there are a couple of ways to do it, the safe and slow way, using the migrate_to_s3 rake task:

https://github.com/discourse/discourse/blob/0879610ffd9ef9f9b29a0ebaa4cb9535434dd61b/lib/tasks/uploads.rake#L111

I think you could go ahead and turn on s3 uploads (there are a number of guides), and possible ssh into your container and run this task, and you wouldn’t experience downtime.

We didn’t go this route because it takes ~15 seconds per upload, and this was going to take days. We were doing this as part of a host migration, and couldn’t have a downtime of that long.

The quick and dirty route is as follows:

  1. Enable S3 Uploads on your site
  2. Back up your site with images, and download the archive.
  3. Unzip the the archive, and navigate to uploads sub-folder, and upload the images to s3, using the aws-cli:
    uploads aws s3 cp . s3://<your-s3-bucket> --recursive --acl public-read
    
  4. We then need to remap all of the references to the public uploads folder to the new location in s3. At the console in your docker container:
    root@dc53d70f611c:/var/www/discourse# discourse remap /uploads/default/ //<your-s3-bucket>.s3.amazonaws.com/ Rewriting all occurences of /uploads/default/ to //<your-s3-bucket>.s3.amazonaws.com/ THIS TASK WILL REWRITE DATA, ARE YOU SURE (type YES) YES Remapping ar_internal_metadata key 0 rows affected! Remapping ar_internal_metadata value ... many more rows

We did the above during a scheduled maintenance window, and the site was in read only mode with a fresh backup on hand, so it was pretty low risk. I’m not sure I’d be comfortable doing it any other way, but it took less than an hour.

12 Likes

Awesome, I’m trying that right now. Thanks!

Gunnar

That did not work as expected. Images in posts now have URLs like this:

https://forums.jag-lovers.com//jl-discourse-uploads.s3.amazonaws.com/original/3X/b/3/b38c9407f8d17b269dc7ee8845d85cf5189ce8b5.jpeg

Instead of this:
https://jl-discourse-uploads.s3.amazonaws.com/original/3X/b/3/b38c9407f8d17b269dc7ee8845d85cf5189ce8b5.jpeg

While on the other hand the small category icons (we use a custom icon for each category) have the correct URLs.

What did I miss?

Thanks,
Gunnar

@Gunnar

Hmm, that’s odd. AFAIK the Uploads table should only reference the relative path, and not the full path to the image location. We had a couple of posts that ended up that way, but it was because the full path was referenced in the post, rather than using the image upload button or copy/pasting. Is this happening for all posts, or perhaps a subset that use the full domain, rather than the relative path?

Either way, something like this should work

remap https://forums.jag-lovers.com//jl-discourse-uploads.s3.amazonaws.com //jl-discourse-uploads.s3.amazonaws.com

You may need to tweak the input string, as I am not sure if you’ll need the https:// or not, it’s best to take a look at either the Uploads table or the unbaked post to see what string your replacing.

2 Likes

It looks like this applies to all posts on the forum. I had a look at several raw/unbaked posts and the images are all referenced like this:

![IMG_20150512_182321|690x388](upload://d5tlLS5VtcU7PAaOgzFsROaB6vc.jpg)

When I look at the uploads table, the entries all have a URL of the form:

//jl-discourse-uploads.s3.amazonaws.com/original/3X/8/a/8a560451dcba13a7f9d4545368c078e82ef890ac.jpeg

Which should be correct, no? So, it almost looks as if something in Discourse is adding the https://forums.jag-lovers.com part on the fly?

Do I have to rebake all the posts?

Thanks, @mmayo!

Thanks,
Gunnar

Hi @sam, perhaps you can help clear this up?

Thanks,
Gunnar

@zogstrip can have a quick look next week

1 Like

Thanks, but I think I got it. I just need a rebake.

Gunnar

2 Likes

Is everything sorted now?

1 Like

Thank you, yes. A rebake fixed it. We’re all set.

In hindsight, if I had to do it again, I’d either just use the slow method, or follow @mmayo’s “quick and dirty route” minus the remap, but plus a rebake. Once the rebake was done, I’d delete the local uploads folder.

That way, the site would be 100% functional the whole time. As it was, we were up and running, but images in posts didn’t display correctly until the posts were rebaked.

Thanks,
Gunnar

3 Likes