Migration of system stored images to S3 after configuration change


(Eric Schleicher) #1

Continuing the discussion from Setting up file and image uploads to S3:

I’m interested in understanding how i can migrate the “original” file content (from before changing to using s3 as image/file storage.) over to the S3 bucket we configure once we determined we were going to keep discourse.

it wasn’t long after lighting it up that our discourse instance took on a bunch of content uploads

Even if it’s very detailed process, I would like to understand how we can go about ‘migrating’ these older images/file over to s3. we would like to run a tight ship.

would wiping out the instand and restoring it into a vanilla install with the s3 configuration setup do the trick?


#2

You can’t. Discourse can’t unwind and move pre-populated images/files before you turned on the S3 file hosting stuff.


(Jeff Atwood) #3

We did this in another direction for a customer, @zogstrip moved their data from S3 on an existing Discourse to our servers.


(Régis Hanol) #4

As @codinghorror mentioned, we had to do that migration in the opposite direction.

You can use the uploads:migrate_from_s3” rake task as an inspiration for doing it the other way.


#5

So I stand partly corrected:

Then it looks like it is ‘possible’ with a rework plan and some effort.

Pretty straightforward one page of code @zogstrip created and mostly readable.

I can make out what it’s doing except for some spots that zip away to your included model thing at the top.

So basically you would:

  • Fetch local images
  • Set in S3 bucket (expanded a bit below).
  • Rewrite local link pointers to the S3 equivalent to the table called ‘uploads’

The local reference can be called, but placing the image will require a few more steps which I assume could be pulled from the discourse code base (somewhere) or referenced through the S3 API.

  • Make you S3 connection (verify your user creds that you can place images there)
  • Place the image in the image bucket
  • Verify the image is there (snag the link)
  • Rewrite your local link into the S3 verified link

All image references look to be in one table.

# Table name: uploads
#
# id :integer not null, primary key
# user_id :integer not null
# original_filename :string(255) not null
# filesize :integer not null
# width :integer
# height :integer
# url :string(255) not null
# created_at :datetime not null
# updated_at :datetime not null
# sha1 :string(40)
# origin :string(1000)

So there is your basic action plan.


(Eric Schleicher) #6

BTW, thank for digging in. i’ll try it in a dev environment


(Nicoles) #7

Before I spend too much time writing a migrate_to_s3 rake task, Eric, did you already end up doing this?


(Jeff Atwood) #8

We no longer recommend storing images in s3 as it substantially complicates your Discourse install. We do still recommend it for backups.


Problem with file attachments
(Nicoles) #9

Ah, interesting. Thanks for letting me know.


(Geoff Bowers) #10

Is it possible to execute this in the standard docker installation? I originally installed using s3 and now would like to revert to local. No ruby/rails experience though.


(Jeff Atwood) #11

After 1.2 is released @zogstrip can you make sure this task is working as we need a migration path if we are deprecating this.


(Régis Hanol) #12

In case anyone is wondering I did end up writing the migrate_to_s3 rake task since we migrated meta to S3 :wink:

Here’s how to do it

  1. Make sure you’ve properly set up S3

  2. SSH into your server and

     # ./var/discourse/launcher enter app
     # rake uploads:migrate_to_s3
    

And patiently wait for the rake task to finish :tada:


Add option to disable backup compression
Problem with file attachments
(Dean Taylor) #13

Hi @zogstrip,

Thanks for the update.

  • Does this remove the files local server?
  • I take it the URL’s in the posts are updated via DbHelper.remap(from, to) so no need to rebuild all the posts?

Cheers.


(Régis Hanol) #14

I want these tasks to be kinda safe, so no, it doesn’t remove files stored locally.

You are right about the remap, it removes the need to rebuild all the posts :wink:


(ljpp) #15

@zogstrip After migrating attachments to S3, you dont need to include them in the backups, right? Amazon S3 SLA is pretty high, and my attachments are not 100% mission critical, so to me it sounds sensible to:

  • Migrate attachments to S3
  • Exclude attachments from backups, thus reducing backup size by 50%.

(Régis Hanol) #16

Well, they aren’t :wink: Only local files are backed-up.


(ljpp) #17

I did the migration but discovered some potential issues with it.

I noticed that the subfolder discourse/shared/standalone/uploads/default/optimized was not migrated to S3 bucket. Only the folder original was transfered.

I also noticed that expanding of some large images now produces an error: The image could not be loaded. The reason for this is that the link format is broken:

http://tappara.co//tapparaimg.s3-eu-west-1.amazonaws.com/original/1X/09373e52d207f528440da644ae7849fea4b906fb.png

New large uploads seem to work, so this is an issue with the migration script.


Generating missing optimized images with S3
Setting up file and image uploads to S3
(ljpp) #18

Hm…also some broken images, like this one.

@zogstrip, I would appreciate some best practice suggestions to fix this without risk of data corruption.


(ljpp) #19

Well, it seems that a rebake fixed all the broken images and links. So it is definitely still needed after the S3 migration.


(Dean Taylor) #20

Looks like the DbHelper.remap call needs a little more info or a 2nd call.

upload.url or from variable is equal to something like /uploads/default/37/634ed4531d491595.jpg

And the post values contain something like this:

 raw:
  "Example\n<img src='/uploads/default/37/634ed4531d491595.jpg'>\nSome other string",
 cooked:
  "<p>Example<br>\n\n<p><img src=\"//example.com/uploads/default/37/634ed4531d491595.jpg\" width=\"100\" height=\"100\"><br>Some other string</p>",

So the raw value is getting correctly replaced but the cooked value is not.

To me this would suggest that there are two calls to DbHelper.remap required:

  1. Firstly a new call - where the from value is in the form //<domain><optional slash + subfolder><upload.url>.
  2. Then the current call.

You might need to URL / HTML encode special characters for the “subfolder” part where someone has chosen some crazy subfolder install path, perhaps CJKV characters etc.

Completing in this order with the longer more specific target first to avoid an incorrect partial replacement.