Moving from one S3 bucket to another

Continuing the discussion from How do I move my s3 upload bucket from one provider to another?:

I’m trying to move from a GCP bucket to an AWS S3 bucket. The old system doesn’t use an S3 CDN (the guy who set it up didn’t really know what he was doing, apparently).

I used s3cmd to sync the old GCP bucket to a local filesytem, then used it again to push the assets to the new S3 bucket. The system is now properly configured with S3 and site CDNs as described in Using Object Storage for Uploads (S3 & Clones).

The above-linked topic suggested using rake posts:remap to update the posts (I guess I should also rebake all posts? Or at least those matching the old bucket?).

When I did the posts:remap it remapped only one post.

 Upload.order(Arel.sql('RANDOM()')).limit(10).pluck(:id, :url)

shows all of those having the old bucket… Ah. that’s the issue. We need not a rake posts:remap but a discourse remap as described at Change the domain name or rename my Discourse?.

Yes. I think so.

I’ll see about getting that done Real Soon Now. @Falco , in broad strokes, it’s something like

  • create new bucket and CDN for it, rebuild container to use the new bucket/CDN & make sure it works
  • configure s3cmd for the old bucket and sync the data to local.
  • configure s3cmd for the new bucket and sync the data up to the new bucket
  • do a discourse remap OLD-BUCKET-DOMAIN-NAME NEW-BUCKET-DOMAIN-NAME
  • rebake

Does that seem right?

If you use the same CDN for the old an new bucket, you might save having to do the rebake, but getting that timing just right seems a bit tricky (can’t change the CDN origin until data is in the new bucket, but you’d need to somehow make sure that nothing got uploaded to the old bucket during the sync process?)–maybe just say that it’s possible.

2 Likes

Maybe using the official AWS CLI is better for a guide?

Use DbHelper.remap here.

Not necessary

Use the same CDN and just change the CDN source in the CDN panel, or use a new CDN and remap with DbHelper.remap. No rebake needed either way.

2 Likes

Ah. OK. I’ll look at that. . . Is it possible to make AWS CLI work with non AWS buckets?

Hey Rafael. I’m getting close. My current version of this howto points to aws cli and gsutil to sync from old to local and local to new (I just link to those tools and provide a command line command filled in with the bucket names in a placeholder). And then uses DbHelper to update the tables. For my moderate-sized site, it runs pretty fast. Awesome.

The one issue I have now is that the old configuration didn’t have a s3_cdn_url so those images are still linked directly to the bucket (not cdn) in posts. Rebaking doesn’t help. New uploads are properly linked to the CDN. You can’t fix this by setting DISCOURSE_S3_ENDPOINT: '' because that has no effect, so after restoring the database, I had to clear the SiteSetting. That’s not so bad, but it took me a bunch of rebuilds to figure that out.

The old configuration didn’t have an S3 CDN defined. I can fix this by rebaking all of the 1250 posts. that have the bucket URL/hostname in them, but this causes all of those images to be downloaded and optimized (the old server is running 2.7.0.beta5, so I thought that it would have some recent re-optimizations already done?). This slams (load average 10-20) my poor 2GB server (with Postgres and redis on RDS and elasticache) for quite a while. I guess I might need a bigger EC2 for this site anyway, but I’m still a bit surprised that this rebake brings the server to its knees (500 errors in the UX).

Should I instead contrive to do a replacement from bucket-hostname to CDN-url in cooked in those posts?

@pfaffman Thanks for leading me here.
But my problem is compounded in the last 2 steps.

Current Problem: Some of my images are missing but displaying small icons in their places. If I hover pointer on them, they show ‘olds3bucket’ address on them. But when I click on them, they do show correctly in full size, and now the path of ‘news3bucket’ is displayed in the url bar.

  1. Mainly you’ve told to sync old bucket’s data to new bucket, which I’ve already done successfully.
  2. Then fill the new bucket’s settings in the discourse Web UI, which I’m already using for a year.
  3. Now you say to ‘remap’ the old bucket url with the new. And then rebake. Here is the problem. When I do this, (or even if I ‘rebake’, or even if ‘Rebuild Html’ from post settings menu, with or without doing the ‘remap’ step first), those of my images which were showing just as an icon, disappear completely. Only a ‘white-space’ is shown in their place. So I revert/restore immediately.

Thanks once again.
(I’ve a very small website, and like wise have…).

rclone is a good tool which can sync to multiple backends. we currently use it for backups.

Hey Jay,

Sorry to bump this but have you made any more headway on the guide? Thanks!