Migration of system stored images to S3 after configuration change


(Nicoles) #7

Before I spend too much time writing a migrate_to_s3 rake task, Eric, did you already end up doing this?


(Jeff Atwood) #8

We no longer recommend storing images in s3 as it substantially complicates your Discourse install. We do still recommend it for backups.


(Nicoles) #9

Ah, interesting. Thanks for letting me know.


(Geoff Bowers) #10

Is it possible to execute this in the standard docker installation? I originally installed using s3 and now would like to revert to local. No ruby/rails experience though.


(Jeff Atwood) #11

After 1.2 is released @zogstrip can you make sure this task is working as we need a migration path if we are deprecating this.


(Régis Hanol) #12

In case anyone is wondering I did end up writing the migrate_to_s3 rake task since we migrated meta to S3 :wink:

Here’s how to do it

  1. Make sure you’ve properly set up S3

  2. SSH into your server and

     # ./var/discourse/launcher enter app
     # rake uploads:migrate_to_s3
    

And patiently wait for the rake task to finish :tada:


Add option to disable backup compression
(Dean Taylor) #13

Hi @zogstrip,

Thanks for the update.

  • Does this remove the files local server?
  • I take it the URL’s in the posts are updated via DbHelper.remap(from, to) so no need to rebuild all the posts?

Cheers.


(Régis Hanol) #14

I want these tasks to be kinda safe, so no, it doesn’t remove files stored locally.

You are right about the remap, it removes the need to rebuild all the posts :wink:


(ljpp) #15

@zogstrip After migrating attachments to S3, you dont need to include them in the backups, right? Amazon S3 SLA is pretty high, and my attachments are not 100% mission critical, so to me it sounds sensible to:

  • Migrate attachments to S3
  • Exclude attachments from backups, thus reducing backup size by 50%.

(Régis Hanol) #16

Well, they aren’t :wink: Only local files are backed-up.


(ljpp) #17

I did the migration but discovered some potential issues with it.

I noticed that the subfolder discourse/shared/standalone/uploads/default/optimized was not migrated to S3 bucket. Only the folder original was transfered.

I also noticed that expanding of some large images now produces an error: The image could not be loaded. The reason for this is that the link format is broken:

http://tappara.co//tapparaimg.s3-eu-west-1.amazonaws.com/original/1X/09373e52d207f528440da644ae7849fea4b906fb.png

New large uploads seem to work, so this is an issue with the migration script.


Generating missing optimized images with S3
Setting up file and image uploads to S3
(ljpp) #18

Hm…also some broken images, like this one.

@zogstrip, I would appreciate some best practice suggestions to fix this without risk of data corruption.


(ljpp) #19

Well, it seems that a rebake fixed all the broken images and links. So it is definitely still needed after the S3 migration.


(Dean Taylor) #20

Looks like the DbHelper.remap call needs a little more info or a 2nd call.

upload.url or from variable is equal to something like /uploads/default/37/634ed4531d491595.jpg

And the post values contain something like this:

 raw:
  "Example\n<img src='/uploads/default/37/634ed4531d491595.jpg'>\nSome other string",
 cooked:
  "<p>Example<br>\n\n<p><img src=\"//example.com/uploads/default/37/634ed4531d491595.jpg\" width=\"100\" height=\"100\"><br>Some other string</p>",

So the raw value is getting correctly replaced but the cooked value is not.

To me this would suggest that there are two calls to DbHelper.remap required:

  1. Firstly a new call - where the from value is in the form //<domain><optional slash + subfolder><upload.url>.
  2. Then the current call.

You might need to URL / HTML encode special characters for the “subfolder” part where someone has chosen some crazy subfolder install path, perhaps CJKV characters etc.

Completing in this order with the longer more specific target first to avoid an incorrect partial replacement.


(Kane York) #21

You could also rebake all posts, as you’re changing raw


(Dean Taylor) #22

Yes, as @ljpp mentioned above a rebake fixed broken images and links.

However my response was focused on solving this for future executions.

When you have over 450,000 posts a rebake is a computationally expensive and lengthy process compared to getting the remap right.


(ljpp) #23

I am still looking for the solution how to migrate the optimized folder:

/var/discourse/shared/standalone/uploads/default/optimized

The migration script succesfully moves the original folder and after a rebake everything is OK, but I have gigabytes of payload in the daily backups due to the optimized not being migrated. Everyone going for S3 will have this issue, and the bigger the forum the bigger the payload.


(ljpp) #24

I am going to do a dirty up! here, as I beleive this issue is significant for everyone who have migrated to S3 after day 1.


(ljpp) #25

Up! Would really love to eliminate those graphics files from my backup.


(Gregg Fuller) #26

How do I clear up server space by deleting the old files?