Changing s3 bucket for uploads

Hi there!

We are migrating all our uploads/images between two different s3 compatible services (both are digital ocean spaces if it matters) and I have decided that we are stuck in a quite bad state.

I’ll start with explaining how the migration was done:

  1. We cloned/synced the initial bucket to the new bucket with rclone
  2. All references in the Files page in discourse administration was updated to the new endpoints
  3. A re-bake was ran

Sadly, this did not do what we wanted, and now all images are “gone” from the forum. They are still in the s3 bucket (and luckily in the old one still as well) but no post can find their respective image.

The size of the bucket is about 60gb, so it’s (even if not extreme) quite a huge bit of data.

I’ve rebuilt the container, I’ve tried to recover stuff from the tombstone, I’ve done pretty much everything I can think of or find in the support forum or rake tasks.
I’ve as well tried with a database replace (via discourse remap).

Each image looks basically like this in baked content at the moment:

<img src="https://xxxx.xxxxx.xx/images/transparent.png" alt="image" data-orig-src="upload://h8UudilPhVsGnNmvlJ5lQYEr8PT.jpeg" width="375" height="500">

Which makes me think that the b64-sha of the link is either broken or the image sha have changed for some reason.

Have anyone done this before? Are all images lost forever? (yes yes, I have a backup and the old images, so I know that there is a way).

2 Likes

Might be worth mentioning that I have tried to use the CDN uri provided for the spaces bucket as well (with a rebake).

1 Like

Output from missing uploads:

rake posts:missing_uploads
Looking for missing uploads on: default
Fixing missing uploads:
🚫
17075 post uploads are missing.

16906 uploads are missing.
1 of 16906 are old scheme uploads.
14646 of 139801 posts are affected.

post_uploads have 3448 entries
optimized_images have 25681 entries
uploads have 5764 entries

1 Like

You can see Moving from one S3 bucket to another

I think that i have a draft of howto that I will try to post tomorrow.

5 Likes

That would be very helpful, tyvm!

1 Like

Hey @Jite !

See if this works for you. If it does, I’ll go about creating a proper howto

Old buckets

This assumes that you can install and configure a tool to move your data from your old bucket to a local machine and then again do the same from local to the new bucket. See aws cli sync (which can be configured for non-AWS buckets) and gsutil rsync for information. If you have huge amounts of data or are moving between buckets on the same provider, then you might want to investigate methods that move the data directly between buckets.

Get in a directory suitable for a holding space. (e.g., mkdir temp-bucket; cd temp-bucket) before doing something like the following. These examples include the -n and --dry-run switches to show you what will happen. If that looks like what you want, run the command again without that switch.

Move old data from old bucket to to local

    gsutil  rsync -r -n  gs://=OLD= .

or

    aws s3 sync s3://=OLD= .

Move data from local to new bucket

    gsutil rsync -r -n . gs://=NEW=

or

    aws s3 sync . s3://=NEW=

Updating the database to use the new bucket

You’’ do these commands at the Rails console, to get there, you’ll do a

cd /var/discourse
./launcher enter app
rails c

For the new bucket, upload an image with the new configuration and do this:

Upload.last.url

You should see something like

=> "//discourse-bucket.s3.dualstack.us-east-2.amazonaws.com/`original/2X/7/12345fbea574afc4e02db80107e6682430aede2c.png"

You’d then get discourse-bucket.s3.dualstack.us-east-2.amazonaws.com for the new bucket. Get the old bucket hostname similarly from the above.

Use this to check that your uploads are where you think they are:

Upload.order(Arel.sql('RANDOM()')).limit(10).pluck(:id, :url)

Now, you’ll update the database to use the new bucket rather than the old one. DbHelper.remap will replace occurrences in all tables.

DbHelper.remap("//=OLDHOST=/","//=NEWHOST=/")

Moving to AWS might require clearing your s3_endpoint.

NOTE: If you have a s3_endpoint defined in your SiteSettings in the database and switch to AWS (where no endpoint is needed), then you’ll need to clear that site setting after you build the new container with the updated settings (or after you restore a database that has it set).

Rebake posts that refer to bucket rather than S3 CDN

If you have posts that link directly to the new s3 bucket (perhaps you didn’t have an s3_cdn_url defined before), then here’s how to rebake only the posts that need it.

Get the posts:

  posts=Post.where("cooked like '%=NEWHOST=%'")

See how many:

  posts.count

Rebake those posts:

  posts.each do |p| p.rebake! end

Or, just replace the bucket with the cdn:

posts.each do |p|
  p.cooked.gsub!(/=NEWHOST=/,"=CDN=")
  p.save!
end

8 Likes

Thank you for the response.

This is basically what I did the last time, but I tried this again. The issue is that posts.count returns 0. All posts have the transperent.png file in the cooked post and contains a hash in the uncooked.
Is there any way to make it resolve the image correctly during bake?

1 Like

Hmm. Right. That cooked change works only to avoid the remake. If a remake does not work then something else is wrong. Maybe the assets are not where discourse thinks?

1 Like

Well, it’s possible, but the “move” was pretty much a 1:1 move of all files in the bucket, hehe…

2 Likes

So you can replace the old bucket url with the new and it works?

Do the values in Uploads look right?

1 Like

I’m a bit scared that changing back to old bucket might start of a job to move everything to tombstone due to them being “old” now, but yes, the database seems to point to the correct (new location) images, it’s basically just the old posts that don’t resolve to the correct image (I’m guessing…).

1 Like

After running the toombstone recovery rake task and then the fix_missing_uploads rake task, I finally got it to start “fix” the images.
It seems like it’s downloading them and uploading them again, and it takes a lot of time and uses a lot of resources, but at the least the users will get their images back!

Thank you for the help @pfaffman :slight_smile:

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.