Something I can't explain results in duplicate copies of images

pfaffman · March 18, 2022, 1:21pm

I’m working with a hosted site that is missing some images.

one of the missinge images, taken from a former revision of a post, points to https://aws1.discourse-cdn.com/business7/uploads/harness/original/2X/1/16d12d54b3b4ae7aabc8a93417570bce0984e3c9.png

It appears that the image in in S3 but not in the Uploads table. Before I have been able to re-add images in a new post to have a new upload record created, which fixes all of the posts using that image when they are rebaked.

I first tried pasting in the URL hoping that Discourse would download the image and create an upload record, but it didn’t, so I uploaded it via the browser, but the new image is different and gets this URL: https://aws1.discourse-cdn.com/business7/uploads/harness/original/2X/d/d59e3eccc6d9d038b6fae8910e787e851c8714de.png (there is a chance that I reversed which image was which).

Also there is the question of how the records are missing from the Uploads table in the first place. I’m working with another formerly-hosted site, which is also missing a bunch of images from the Uploads table. This might deserve a second topic, but it might be related. For a bunch of those images, I was able to find them in S3, download them, and create a new Upload record like this:

  sha1= Upload.sha1_from_short_url(short_url)
  extension = short_url.split(".").last
  upload = Upload.find_by(sha1: sha1) 
  prefix = "url for the s3 bucket"
  if !upload
    # try to find it in s3
    one = sha1[0]
    two=sha1[1]
    url_link = "#{prefix}/#{one}/#{two}/#{sha1}.#{extension}"
    puts "URL: #{url_link}"
    url = URI.parse(url_link)
    full_filename = url_link.gsub(remove_url,"/shared/uploads/default/")
    filename = "/tmp/#{File.basename(url_link.gsub(remove_url,"/shared/uploads/default/"))}"
    dirname = File.dirname(filename)
    unless File.directory?(dirname)
      FileUtils.mkdir_p(dirname)
    end
    File.open(filename, "w") do |file|
      Net::HTTP.start(url.host) do |http|
        resp = http.get(url.path)
        open(file, "wb") do |file|
          file.write(resp.body)
        end
      end
    end
      # make upload for file
   ...

It looks like this method will fix about 25% of the affected posts on the other site.

Falco · March 18, 2022, 1:47pm

I don’t quite understand how the title of this topic related to the rest of it.

The original you linked is a 14Kb PNG. An image of such small size won’t trigger the client side image optimization routine.

pfaffman · March 18, 2022, 1:51pm

Then I’m all the more confused how the same image is generating two different SHA1’s/paths?

I downloaded the image from the one bucket and uploaded it via the browser and it generated a second copy of the same image with a different filename. The two files are different; If it didn’t get changed by the browser on upload, then I don’t know what else could have changed it.

Oh. Perhaps the image in the bucket has been changed from the original image that was uploaded, so it is different from the original image. So what would need to happen is to download the original image rather than . . . no. original is in the path.

Topic		Replies	Views
Broken Images and Their S3 URLs Support	33	4445	September 30, 2020
Access Denied for aws S3 image Bug	5	2207	September 29, 2018
How to control this mess of shifting uploads from S3 to local Support s3	2	32	July 3, 2025
My images are not working Installation uploads , s3	15	737	April 24, 2023
Rebaking old posts won't pull new S3 CDN URL after S3 bucket rename Installation	14	1745	January 12, 2021

Something I can't explain results in duplicate copies of images

Related topics