Migration of system stored images to S3 after configuration change


(Eric Schleicher) #1

Continuing the discussion from Setting up file and image uploads to S3:

I’m interested in understanding how i can migrate the “original” file content (from before changing to using s3 as image/file storage.) over to the S3 bucket we configure once we determined we were going to keep discourse.

it wasn’t long after lighting it up that our discourse instance took on a bunch of content uploads

Even if it’s very detailed process, I would like to understand how we can go about ‘migrating’ these older images/file over to s3. we would like to run a tight ship.

would wiping out the instand and restoring it into a vanilla install with the s3 configuration setup do the trick?


(Jeff Atwood) #3

We did this in another direction for a customer, @zogstrip moved their data from S3 on an existing Discourse to our servers.


(Régis Hanol) #4

As @codinghorror mentioned, we had to do that migration in the opposite direction.

You can use the uploads:migrate_from_s3” rake task as an inspiration for doing it the other way.


(Régis Hanol) #12

In case anyone is wondering I did end up writing the migrate_to_s3 rake task since we migrated meta to S3 :wink:

Here’s how to do it

  1. Make sure you’ve properly set up S3

  2. SSH into your server and

     # ./var/discourse/launcher enter app
     # rake uploads:migrate_to_s3
    

And patiently wait for the rake task to finish :tada:


Add option to disable backup compression
Problem with file attachments
(Dean Taylor) #13

Hi @zogstrip,

Thanks for the update.

  • Does this remove the files local server?
  • I take it the URL’s in the posts are updated via DbHelper.remap(from, to) so no need to rebuild all the posts?

Cheers.


(Régis Hanol) #14

I want these tasks to be kinda safe, so no, it doesn’t remove files stored locally.

You are right about the remap, it removes the need to rebuild all the posts :wink:


(ljpp) #15

@zogstrip After migrating attachments to S3, you dont need to include them in the backups, right? Amazon S3 SLA is pretty high, and my attachments are not 100% mission critical, so to me it sounds sensible to:

  • Migrate attachments to S3
  • Exclude attachments from backups, thus reducing backup size by 50%.

(Régis Hanol) #16

Well, they aren’t :wink: Only local files are backed-up.


(ljpp) #17

I did the migration but discovered some potential issues with it.

I noticed that the subfolder discourse/shared/standalone/uploads/default/optimized was not migrated to S3 bucket. Only the folder original was transfered.

I also noticed that expanding of some large images now produces an error: The image could not be loaded. The reason for this is that the link format is broken:

http://tappara.co//tapparaimg.s3-eu-west-1.amazonaws.com/original/1X/09373e52d207f528440da644ae7849fea4b906fb.png

New large uploads seem to work, so this is an issue with the migration script.


Generating missing optimized images with S3
Setting up file and image uploads to S3
(ljpp) #18

Hm…also some broken images, like this one.

@zogstrip, I would appreciate some best practice suggestions to fix this without risk of data corruption.


(ljpp) #19

Well, it seems that a rebake fixed all the broken images and links. So it is definitely still needed after the S3 migration.


(Dean Taylor) #20

Looks like the DbHelper.remap call needs a little more info or a 2nd call.

upload.url or from variable is equal to something like /uploads/default/37/634ed4531d491595.jpg

And the post values contain something like this:

 raw:
  "Example\n<img src='/uploads/default/37/634ed4531d491595.jpg'>\nSome other string",
 cooked:
  "<p>Example<br>\n\n<p><img src=\"//example.com/uploads/default/37/634ed4531d491595.jpg\" width=\"100\" height=\"100\"><br>Some other string</p>",

So the raw value is getting correctly replaced but the cooked value is not.

To me this would suggest that there are two calls to DbHelper.remap required:

  1. Firstly a new call - where the from value is in the form //<domain><optional slash + subfolder><upload.url>.
  2. Then the current call.

You might need to URL / HTML encode special characters for the “subfolder” part where someone has chosen some crazy subfolder install path, perhaps CJKV characters etc.

Completing in this order with the longer more specific target first to avoid an incorrect partial replacement.


(Kane York) #21

You could also rebake all posts, as you’re changing raw


(Dean Taylor) #22

Yes, as @ljpp mentioned above a rebake fixed broken images and links.

However my response was focused on solving this for future executions.

When you have over 450,000 posts a rebake is a computationally expensive and lengthy process compared to getting the remap right.


(ljpp) #23

I am still looking for the solution how to migrate the optimized folder:

/var/discourse/shared/standalone/uploads/default/optimized

The migration script succesfully moves the original folder and after a rebake everything is OK, but I have gigabytes of payload in the daily backups due to the optimized not being migrated. Everyone going for S3 will have this issue, and the bigger the forum the bigger the payload.


(Gregg Fuller) #26

How do I clear up server space by deleting the old files?


(Scott Smith) #27

I believe everything in /var/discourse/shared/standalone/uploads/default/original/ can be removed from the server if you ran the migrate_to_s3 script. But I haven’t tried it myself. If you want to be safe first mv the directory to another location, and make sure the forum still works. You can always move them back if something blows up.

I am going to do this migration soon myself but am running a test in a dev install first. Edit: seems to work fine on a dev install to just delete the above directory.

While I was testing I looked into the optimized/ images more, after migration the local ones are only different-sized avatars and seem to be never stored on S3 - even after migrating to S3 new avatars in different sizes are put locally. They are also uploaded to S3, but I can’t ever see when they are fetched - every avatar loaded on the webpage still seems to be locally loaded even though there is a copy on S3. Clear as mud…


(John Lynch) #28

I’ve been investigating the migration of optimized images this evening and my initial poking around seems to indicate that I ought to be able to amend the migrate_to_s3 task so that it runs over not only the Upload records, but also the OptimizedImage records. The relevant bit in the rake task seems to be here:

  # Migrate all uploads
  Upload.where.not(sha1: nil)
        .where("url NOT LIKE '#{s3.absolute_base_url}%'")
        .find_each do |upload|

I’m a total Ruby/Rake/Discourse newb (though a programmer by trade), but it seems to me that there is a similar table (is this the correct term?) for optimized images called “OptimizedImage” that appears to contain the same information as the Upload table, or at least, information sufficient to migrate them to S3.

Is there any reason that also running the migrate_to_s3 logic over the records in OptimizedImage would not succeed in migrating them to S3 also?


(Régis Hanol) #29

Optimized Images are thumbnails generated from images stored in the uploads table. So, you’re pretty safe also migrating Optimized Images :wink:


(Patrick Paul) #30

I just migrated to using S3 to cut down on disk space needs on the host webserver.

I performed rake uploads:migrate_to_s3 and then rake posts:rebake and this successfully migrated all posts’ images/uploads.

However, somehow avatars are now 1) missing for users that uploaded manually (now defaulting to the grey egg avatar pic), 2) visible for users that had used gravatar service (I think), 3) still served locally from the host webserver instead of from S3. Is this the correct behavior? Is there anyway to migrate avatar photos to S3 also? (Or at least fix #1?)