How to control this mess of shifting uploads from S3 to local

After 4-5 years of usage, I finally decided to take my uploads back from Aws S3 bucket to my local server for my very small local website.
I being a limited knowledable person, I handed over this job to some friend of mine for a very reasonable amount. He configured the site for local uploads, but somehow almost half of the 3000 images, around 50% got broke from their source. My friend didn’t charge me anything and asked me to revert the site to the backup (which was created before handing over the control to him on 11/Apr/2025).

Anyway, I became lazy for around a month, and didn’t revert. Until when, I decided to fix things with the help of discourse helper bot/ChatGpt Ai Bot. And created another version of my old website on my ubuntu laptop locally.

I succeeded in creating an instance of my original website on my laptop by just using a “t.” infront of my original domain name. Now this (called staging site) is working fully ok for me, but has activity only up to 11/Apr/2025.

And my Production website, which has all the data up to date, but has many 100s of posts missing images from them.

mind that I tried many rake tasks for migration or for reconnecting the missing connection to images, wo any success.

After banging my head for nearly a month. My conclusion is this. That the raw posts in ruby are same in staging and in production.
But cooked posts become diff. That is my production website’s database table is perhaps missing some connection to the actual physical images lying on server.

I’ve also noted that without that connection, those ‘orphaned’ images get auto-pruned from the server. But thankfully, I again rsync them from staging or from my S3 bucket to my production server.

Finally the problem, in the words of, more or less, ChatGpt, that the staging server either has final cooked versions, which don’t have any relation to the (short) raw urls. And the production, which is missing the final cooked version urls to images, can’t get those images correct urls and is falling back to ‘transparent’ placeholders.

And ChatGpt is suggesting me to copy cooked version from staging posts to the cooked version of production. Which doesn’t seem a very good idea to me.

Exact wording from ChatGpt as to where we stand:

  • In both staging and production, post.raw is identical and contains upload://... references.
  • In staging, images show up, but querying Post.find(12849).uploads gives no results — meaning there are no uploads or post_uploads table entries for these files even in staging.
  • So images are displaying in staging purely because the cooked HTML from before migration contains the full /uploads/default/original/... links.
  • But since production was rebaked post-migration, the same raw content now fails to resolve, defaulting to the transparent.png placeholder.

:white_check_mark: Upload Files Still Exist on Disk

All image files (including those for the missing uploads) are still present under /var/www/discourse/public/uploads/default/original/ on both staging and production. But Discourse can’t resolve them anymore since their uploads entries are missing.

The easy way to do it was, and may still be to turn on the Enable hidden setting to include S3 uploads in the backups setting, make a backup, and then restore on a server that does not have s3 configured (I would do it on a fresh server to avoid breaking the old one if something goes wrong). But it sounds like the production site is broken too, so that likely won’t help at all.

If you’ve mucked up the Uploads table so that it has multiple S3 paths in it, the job is much harder.

Rather than ChatGPT, I’d recommend https://ask.discourse.com/, which at least knows about Discourse, but probably still won’t be much help.

I would look at Uploads.pluck(:url) and see what’s there.

1 Like