Migration of system stored images to S3 after configuration change


(ljpp) #15

@zogstrip After migrating attachments to S3, you dont need to include them in the backups, right? Amazon S3 SLA is pretty high, and my attachments are not 100% mission critical, so to me it sounds sensible to:

  • Migrate attachments to S3
  • Exclude attachments from backups, thus reducing backup size by 50%.

(Régis Hanol) #16

Well, they aren’t :wink: Only local files are backed-up.


(ljpp) #17

I did the migration but discovered some potential issues with it.

I noticed that the subfolder discourse/shared/standalone/uploads/default/optimized was not migrated to S3 bucket. Only the folder original was transfered.

I also noticed that expanding of some large images now produces an error: The image could not be loaded. The reason for this is that the link format is broken:

http://tappara.co//tapparaimg.s3-eu-west-1.amazonaws.com/original/1X/09373e52d207f528440da644ae7849fea4b906fb.png

New large uploads seem to work, so this is an issue with the migration script.


Generating missing optimized images with S3
Setting up file and image uploads to S3
(ljpp) #18

Hm…also some broken images, like this one.

@zogstrip, I would appreciate some best practice suggestions to fix this without risk of data corruption.


(ljpp) #19

Well, it seems that a rebake fixed all the broken images and links. So it is definitely still needed after the S3 migration.


(Dean Taylor) #20

Looks like the DbHelper.remap call needs a little more info or a 2nd call.

upload.url or from variable is equal to something like /uploads/default/37/634ed4531d491595.jpg

And the post values contain something like this:

 raw:
  "Example\n<img src='/uploads/default/37/634ed4531d491595.jpg'>\nSome other string",
 cooked:
  "<p>Example<br>\n\n<p><img src=\"//example.com/uploads/default/37/634ed4531d491595.jpg\" width=\"100\" height=\"100\"><br>Some other string</p>",

So the raw value is getting correctly replaced but the cooked value is not.

To me this would suggest that there are two calls to DbHelper.remap required:

  1. Firstly a new call - where the from value is in the form //<domain><optional slash + subfolder><upload.url>.
  2. Then the current call.

You might need to URL / HTML encode special characters for the “subfolder” part where someone has chosen some crazy subfolder install path, perhaps CJKV characters etc.

Completing in this order with the longer more specific target first to avoid an incorrect partial replacement.


(Kane York) #21

You could also rebake all posts, as you’re changing raw


(Dean Taylor) #22

Yes, as @ljpp mentioned above a rebake fixed broken images and links.

However my response was focused on solving this for future executions.

When you have over 450,000 posts a rebake is a computationally expensive and lengthy process compared to getting the remap right.


(ljpp) #23

I am still looking for the solution how to migrate the optimized folder:

/var/discourse/shared/standalone/uploads/default/optimized

The migration script succesfully moves the original folder and after a rebake everything is OK, but I have gigabytes of payload in the daily backups due to the optimized not being migrated. Everyone going for S3 will have this issue, and the bigger the forum the bigger the payload.


(Gregg Fuller) #26

How do I clear up server space by deleting the old files?


(Scott Smith) #27

I believe everything in /var/discourse/shared/standalone/uploads/default/original/ can be removed from the server if you ran the migrate_to_s3 script. But I haven’t tried it myself. If you want to be safe first mv the directory to another location, and make sure the forum still works. You can always move them back if something blows up.

I am going to do this migration soon myself but am running a test in a dev install first. Edit: seems to work fine on a dev install to just delete the above directory.

While I was testing I looked into the optimized/ images more, after migration the local ones are only different-sized avatars and seem to be never stored on S3 - even after migrating to S3 new avatars in different sizes are put locally. They are also uploaded to S3, but I can’t ever see when they are fetched - every avatar loaded on the webpage still seems to be locally loaded even though there is a copy on S3. Clear as mud…


(John Lynch) #28

I’ve been investigating the migration of optimized images this evening and my initial poking around seems to indicate that I ought to be able to amend the migrate_to_s3 task so that it runs over not only the Upload records, but also the OptimizedImage records. The relevant bit in the rake task seems to be here:

  # Migrate all uploads
  Upload.where.not(sha1: nil)
        .where("url NOT LIKE '#{s3.absolute_base_url}%'")
        .find_each do |upload|

I’m a total Ruby/Rake/Discourse newb (though a programmer by trade), but it seems to me that there is a similar table (is this the correct term?) for optimized images called “OptimizedImage” that appears to contain the same information as the Upload table, or at least, information sufficient to migrate them to S3.

Is there any reason that also running the migrate_to_s3 logic over the records in OptimizedImage would not succeed in migrating them to S3 also?


(Régis Hanol) #29

Optimized Images are thumbnails generated from images stored in the uploads table. So, you’re pretty safe also migrating Optimized Images :wink:


(Patrick Paul) #30

I just migrated to using S3 to cut down on disk space needs on the host webserver.

I performed rake uploads:migrate_to_s3 and then rake posts:rebake and this successfully migrated all posts’ images/uploads.

However, somehow avatars are now 1) missing for users that uploaded manually (now defaulting to the grey egg avatar pic), 2) visible for users that had used gravatar service (I think), 3) still served locally from the host webserver instead of from S3. Is this the correct behavior? Is there anyway to migrate avatar photos to S3 also? (Or at least fix #1?)


#31

When I ran this task, it failed because I ran out of disk space on the local server. So there’s already some files in the S3 bucket. If I run the task again after freeing up some space, will those existing S3 files affect the rake task?


(Gunnar Helliesen) #32

Is this still the recommended way of doing this? Does it work? Anyone tried lately?

The reason I ask is because we have over 1.8 million messages, so rebaking everything is going to take a while…

Thanks!
Gunnar


#33

Still worked for me about a couple of months ago. For the rebake afterwards, I did have to size up my droplet in cpu./ memory to avoid some weird messages popping up during the rebake, which I assumed to be issues. but once I ran the rebake task with enough cpu/ memory, the process went just fine.

The s3 image migration was real slow though, like 100k/sec. Not sure if its me. also i recall i might have needed temp space that’s the same size as my migrating images for the process to complete.


(Gunnar Helliesen) #34

Thanks! As I understood it, from various discussions here on meta, there was talk of fixing the migrate script so a rebake would no longer be necessary. That’s not happened, you still had to rebake everything?

Thanks again,
Gunnar


#35

That would be best answered by the team. I also had other reasons that I wanted to run a rebake at that time.

But, my database is much smaller than yours, so its not a huge deal for me to run the rebake.


(Sam Saffron) #39

Yeah the migrate to s3 job is the way to go, but keep in mind s3 is not super fast, I would strongly recommend a CDN for it as well