Step-by-step for moving large site uploads to S3?


(Gunnar Helliesen) #1

Our site has close to 2 million posts. Our uploads directory is currently 18 GB in size. The site is active 24/7.

What would be a recommended step-by-step procedure for moving our uploads to S3?

Can we avoid downtime while doing this?

Thanks,
Gunnar


(Micah Mayo) #2

We recently went through this process, and there are a couple of ways to do it, the safe and slow way, using the migrate_to_s3 rake task:

I think you could go ahead and turn on s3 uploads (there are a number of guides), and possible ssh into your container and run this task, and you wouldn’t experience downtime.

We didn’t go this route because it takes ~15 seconds per upload, and this was going to take days. We were doing this as part of a host migration, and couldn’t have a downtime of that long.

The quick and dirty route is as follows:

  1. Enable S3 Uploads on your site
  2. Back up your site with images, and download the archive.
  3. Unzip the the archive, and navigate to uploads sub-folder, and upload the images to s3, using the aws-cli:
    uploads aws s3 cp . s3://<your-s3-bucket> --recursive --acl public-read
    
  4. We then need to remap all of the references to the public uploads folder to the new location in s3. At the console in your docker container:
    root@dc53d70f611c:/var/www/discourse# discourse remap /uploads/default/ //<your-s3-bucket>.s3.amazonaws.com/ Rewriting all occurences of /uploads/default/ to //<your-s3-bucket>.s3.amazonaws.com/ THIS TASK WILL REWRITE DATA, ARE YOU SURE (type YES) YES Remapping ar_internal_metadata key 0 rows affected! Remapping ar_internal_metadata value ... many more rows

We did the above during a scheduled maintenance window, and the site was in read only mode with a fresh backup on hand, so it was pretty low risk. I’m not sure I’d be comfortable doing it any other way, but it took less than an hour.


(Gunnar Helliesen) #3

Awesome, I’m trying that right now. Thanks!

Gunnar


(Gunnar Helliesen) #4

That did not work as expected. Images in posts now have URLs like this:

https://forums.jag-lovers.com//jl-discourse-uploads.s3.amazonaws.com/original/3X/b/3/b38c9407f8d17b269dc7ee8845d85cf5189ce8b5.jpeg

Instead of this:
https://jl-discourse-uploads.s3.amazonaws.com/original/3X/b/3/b38c9407f8d17b269dc7ee8845d85cf5189ce8b5.jpeg

While on the other hand the small category icons (we use a custom icon for each category) have the correct URLs.

What did I miss?

Thanks,
Gunnar


(Micah Mayo) #5

@Gunnar

Hmm, that’s odd. AFAIK the Uploads table should only reference the relative path, and not the full path to the image location. We had a couple of posts that ended up that way, but it was because the full path was referenced in the post, rather than using the image upload button or copy/pasting. Is this happening for all posts, or perhaps a subset that use the full domain, rather than the relative path?

Either way, something like this should work

remap https://forums.jag-lovers.com//jl-discourse-uploads.s3.amazonaws.com //jl-discourse-uploads.s3.amazonaws.com

You may need to tweak the input string, as I am not sure if you’ll need the https:// or not, it’s best to take a look at either the Uploads table or the unbaked post to see what string your replacing.


(Gunnar Helliesen) #6

It looks like this applies to all posts on the forum. I had a look at several raw/unbaked posts and the images are all referenced like this:

![IMG_20150512_182321|690x388](upload://d5tlLS5VtcU7PAaOgzFsROaB6vc.jpg)

When I look at the uploads table, the entries all have a URL of the form:

//jl-discourse-uploads.s3.amazonaws.com/original/3X/8/a/8a560451dcba13a7f9d4545368c078e82ef890ac.jpeg

Which should be correct, no? So, it almost looks as if something in Discourse is adding the https://forums.jag-lovers.com part on the fly?

Do I have to rebake all the posts?

Thanks, @mmayo!

Thanks,
Gunnar


(Gunnar Helliesen) #7

Hi @sam, perhaps you can help clear this up?

Thanks,
Gunnar


(Sam Saffron) #8

@zogstrip can have a quick look next week


(Gunnar Helliesen) #9

Thanks, but I think I got it. I just need a rebake.

Gunnar


(Régis Hanol) #10

Is everything sorted now?


(Gunnar Helliesen) #11

Thank you, yes. A rebake fixed it. We’re all set.

In hindsight, if I had to do it again, I’d either just use the slow method, or follow @mmayo’s “quick and dirty route” minus the remap, but plus a rebake. Once the rebake was done, I’d delete the local uploads folder.

That way, the site would be 100% functional the whole time. As it was, we were up and running, but images in posts didn’t display correctly until the posts were rebaked.

Thanks,
Gunnar


(Régis Hanol) #12