Step-by-step for moving large site uploads to S3?

(Gunnar Helliesen) #1

Our site has close to 2 million posts. Our uploads directory is currently 18 GB in size. The site is active 24/7.

What would be a recommended step-by-step procedure for moving our uploads to S3?

Can we avoid downtime while doing this?


(Micah Mayo) #2

We recently went through this process, and there are a couple of ways to do it, the safe and slow way, using the migrate_to_s3 rake task:

I think you could go ahead and turn on s3 uploads (there are a number of guides), and possible ssh into your container and run this task, and you wouldn’t experience downtime.

We didn’t go this route because it takes ~15 seconds per upload, and this was going to take days. We were doing this as part of a host migration, and couldn’t have a downtime of that long.

The quick and dirty route is as follows:

  1. Enable S3 Uploads on your site
  2. Back up your site with images, and download the archive.
  3. Unzip the the archive, and navigate to uploads sub-folder, and upload the images to s3, using the aws-cli:
    uploads aws s3 cp . s3://<your-s3-bucket> --recursive --acl public-read
  4. We then need to remap all of the references to the public uploads folder to the new location in s3. At the console in your docker container:
    root@dc53d70f611c:/var/www/discourse# discourse remap /uploads/default/ //<your-s3-bucket> Rewriting all occurences of /uploads/default/ to //<your-s3-bucket> THIS TASK WILL REWRITE DATA, ARE YOU SURE (type YES) YES Remapping ar_internal_metadata key 0 rows affected! Remapping ar_internal_metadata value ... many more rows

We did the above during a scheduled maintenance window, and the site was in read only mode with a fresh backup on hand, so it was pretty low risk. I’m not sure I’d be comfortable doing it any other way, but it took less than an hour.

(Gunnar Helliesen) #3

Awesome, I’m trying that right now. Thanks!


(Gunnar Helliesen) #4

That did not work as expected. Images in posts now have URLs like this:

Instead of this:

While on the other hand the small category icons (we use a custom icon for each category) have the correct URLs.

What did I miss?


(Micah Mayo) #5


Hmm, that’s odd. AFAIK the Uploads table should only reference the relative path, and not the full path to the image location. We had a couple of posts that ended up that way, but it was because the full path was referenced in the post, rather than using the image upload button or copy/pasting. Is this happening for all posts, or perhaps a subset that use the full domain, rather than the relative path?

Either way, something like this should work

remap //

You may need to tweak the input string, as I am not sure if you’ll need the https:// or not, it’s best to take a look at either the Uploads table or the unbaked post to see what string your replacing.

(Gunnar Helliesen) #6

It looks like this applies to all posts on the forum. I had a look at several raw/unbaked posts and the images are all referenced like this:


When I look at the uploads table, the entries all have a URL of the form:


Which should be correct, no? So, it almost looks as if something in Discourse is adding the part on the fly?

Do I have to rebake all the posts?

Thanks, @mmayo!


(Gunnar Helliesen) #7

Hi @sam, perhaps you can help clear this up?


(Sam Saffron) #8

@zogstrip can have a quick look next week

(Gunnar Helliesen) #9

Thanks, but I think I got it. I just need a rebake.