Something I can't explain results in duplicate copies of images

I’m working with a hosted site that is missing some images.

one of the missinge images, taken from a former revision of a post, points to https://aws1.discourse-cdn.com/business7/uploads/harness/original/2X/1/16d12d54b3b4ae7aabc8a93417570bce0984e3c9.png

It appears that the image in in S3 but not in the Uploads table. Before I have been able to re-add images in a new post to have a new upload record created, which fixes all of the posts using that image when they are rebaked.

I first tried pasting in the URL hoping that Discourse would download the image and create an upload record, but it didn’t, so I uploaded it via the browser, but the new image is different and gets this URL: https://aws1.discourse-cdn.com/business7/uploads/harness/original/2X/d/d59e3eccc6d9d038b6fae8910e787e851c8714de.png (there is a chance that I reversed which image was which).

Also there is the question of how the records are missing from the Uploads table in the first place. I’m working with another formerly-hosted site, which is also missing a bunch of images from the Uploads table. This might deserve a second topic, but it might be related. For a bunch of those images, I was able to find them in S3, download them, and create a new Upload record like this:

  sha1= Upload.sha1_from_short_url(short_url)
  extension = short_url.split(".").last
  upload = Upload.find_by(sha1: sha1) 
  prefix = "url for the s3 bucket"
  if !upload
    # try to find it in s3
    one = sha1[0]
    two=sha1[1]
    url_link = "#{prefix}/#{one}/#{two}/#{sha1}.#{extension}"
    puts "URL: #{url_link}"
    url = URI.parse(url_link)
    full_filename = url_link.gsub(remove_url,"/shared/uploads/default/")
    filename = "/tmp/#{File.basename(url_link.gsub(remove_url,"/shared/uploads/default/"))}"
    dirname = File.dirname(filename)
    unless File.directory?(dirname)
      FileUtils.mkdir_p(dirname)
    end
    File.open(filename, "w") do |file|
      Net::HTTP.start(url.host) do |http|
        resp = http.get(url.path)
        open(file, "wb") do |file|
          file.write(resp.body)
        end
      end
    end
      # make upload for file
   ...

It looks like this method will fix about 25% of the affected posts on the other site.

I don’t quite understand how the title of this topic related to the rest of it.

The original you linked is a 14Kb PNG. An image of such small size won’t trigger the client side image optimization routine.

Then I’m all the more confused how the same image is generating two different SHA1’s/paths?

I downloaded the image from the one bucket and uploaded it via the browser and it generated a second copy of the same image with a different filename. The two files are different; If it didn’t get changed by the browser on upload, then I don’t know what else could have changed it.

Oh. Perhaps the image in the bucket has been changed from the original image that was uploaded, so it is different from the original image. So what would need to happen is to download the original image rather than . . . no. original is in the path.