How to find any missing images?

Sure. Here’s the raw text:

"It looks like \"Fung-Wong\", \"Mario\", or simply \"Typhoon #16\" will be [making landfall in Japan on Thursday](http://www.jma.go.jp/jp/typh/1416l.html):\n\n![Typhoon 16](/uploads/default/35/4608d96d1b27846f.png)"

And here’s the cooked text:

"<p>It looks like “Fung-Wong”, “Mario”, or simply “Typhoon <span class=\"hashtag\">#16</span>” will be <a href=\"http://www.jma.go.jp/jp/typh/1416l.html\" rel=\"nofollow noopener\">making landfall in Japan on Thursday</a>:</p>\n<p><div class=\"lightbox-wrapper\"><a class=\"lightbox\" href=\"/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png\" title=\"4608d96d1b27846f.png\"><img src=\"/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png\" alt=\"Typhoon 16\" width=\"602\" height=\"500\"><div class=\"meta\">\n<svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#far-image\"></use></svg><span class=\"filename\">4608d96d1b27846f.png</span><span class=\"informations\">800×664</span><svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\"><use xlink:href=\"#discourse-expand\"></use></svg>\n</div></a></div></p>"

These are kind of hard to read when they’re compressed on a single line, so here’s the prettified raw text:

"It looks like \"Fung-Wong\", \"Mario\", or simply
\"Typhoon #16\" will be [making landfall in Japan on Thursday]
(http://www.jma.go.jp/jp/typh/1416l.html):\n\n
![Typhoon 16](/uploads/default/35/4608d96d1b27846f.png)"

And here’s the prettified cooked text:

"<p>
  It looks like “Fung-Wong”, “Mario”, or simply
  “Typhoon <span class=\"hashtag\">#16</span>” will be
  <a href=\"http://www.jma.go.jp/jp/typh/1416l.html\" rel=\"nofollow noopener\">
    making landfall in Japan on Thursday
  </a>:
 </p>\n
 <p>
   <div class=\"lightbox-wrapper\">
     <a class=\"lightbox\" href=\"/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png\" title=\"4608d96d1b27846f.png\">
       <img src=\"/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png\" alt=\"Typhoon 16\" width=\"602\" height=\"500\">
       <div class=\"meta\">\n
         <svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\">
           <use xlink:href=\"#far-image\"></use>
         </svg>
         <span class=\"filename\">4608d96d1b27846f.png</span>
         <span class=\"informations\">800×664</span>
         <svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\">
           <use xlink:href=\"#discourse-expand\"></use>
         </svg>\n
       </div>
     </a>
   </div>
 </p>"

For what it’s worth, there are quite a few (well over a hundred) posts like this on my site:

[1] pry(main)> Post.where("raw ~* :regex AND cooked !~* :regex", regex: '/uploads/default/[0-9]+/').count
=> 135
3 Likes

You shouldn’t have different URL formats in raw and cooked contents. Can you please try to rebake the above post? You can do it by the Rebuild HTML post menu or by post.rebake! command. Is there any “uploads” related post custom fields are exists in this post? You can read all custom fields by post.custom_fields command.

Here are all the other custom fields on that particular post (before running the Rebuild HTML command):

  id: 43,
  user_id: 1,
  topic_id: 36,
  post_number: 3,
  created_at: Mon, 22 Sep 2014 05:05:16 UTC +00:00,
  updated_at: Mon, 22 Sep 2014 05:11:22 UTC +00:00,
  reply_to_post_number: nil,
  reply_count: 0,
  quote_count: 0,
  deleted_at: nil,
  off_topic_count: 0,
  like_count: 0,
  incoming_link_count: 0,
  bookmark_count: 0,
  avg_time: 58,
  score: 1.2,
  reads: 6,
  post_type: 1,
  sort_order: 3,
  last_editor_id: -1,
  hidden: false,
  hidden_reason_id: nil,
  notify_moderators_count: 0,
  spam_count: 0,
  illegal_count: 0,
  inappropriate_count: 0,
  last_version_at: Mon, 22 Sep 2014 05:11:22 UTC +00:00,
  user_deleted: false,
  reply_to_user_id: nil,
  percent_rank: 0.585365853658537,
  notify_user_count: 0,
  like_score: 0,
  deleted_by_id: nil,
  edit_reason: "downloaded local copies of images",
  word_count: 34,
  version: 2,
  cook_method: 1,
  wiki: false,
  baked_at: Sun, 14 Apr 2019 09:28:00 UTC +00:00,
  baked_version: 2,
  hidden_at: nil,
  self_edits: 2,
  reply_quoted: false,
  via_email: false,
  raw_email: nil,
  public_version: 2,
  action_code: nil,
  image_url: "/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png",
  locked_by_id: nil

I don’t see an uploads field, but perhaps image_url is what you are looking for? Its value—before running the Rebuild HTML command—was:

/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png

Running the Rebuild HTML command appears to have changed the value of the image_url field to:

https://{{SITE FQDN}}/uploads/default/35/4608d96d1b27846f.png

All of the URLs in the cooked text appear to have been updated, as well:

"<p>
  It looks like “Fung-Wong”, “Mario”, or simply
  “Typhoon <span class=\"hashtag\">#16</span>” will be
  <a href=\"http://www.jma.go.jp/jp/typh/1416l.html\" rel=\"nofollow noopener\">
    making landfall in Japan on Thursday
  </a>:
 </p>\n
 <p>
   <div class=\"lightbox-wrapper\">
     <a class=\"lightbox\" href=\"https://{{SITE FQDN}}/uploads/default/35/4608d96d1b27846f.png\">
       <img src=\"https://{{SITE FQDN}}/uploads/default/35/4608d96d1b27846f.png\" alt=\"Typhoon 16\" width=\"602\" height=\"500\">
       <div class=\"meta\">\n
         <svg class=\"fa d-icon d-icon-far-image svg-icon\" aria-hidden=\"true\">
           <use xlink:href=\"#far-image\"></use>
         </svg>
         <span class=\"filename\">4608d96d1b27846f.png</span>
         <span class=\"informations\">800×664</span>
         <svg class=\"fa d-icon d-icon-discourse-expand svg-icon\" aria-hidden=\"true\">
           <use xlink:href=\"#discourse-expand\"></use>
         </svg>\n
       </div>
     </a>
   </div>
 </p>"

What’s the relationship between 4608d96d1b27846f.png and 01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png? They have the same dimensions and look identical at a glance, but they are clearly different files:

$ diff /var/discourse/shared/standalone/uploads/default/35/4608d96d1b27846f.png /var/discourse/shared/standalone/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png
Binary files /var/discourse/shared/standalone/uploads/default/35/4608d96d1b27846f.png and /var/discourse/shared/standalone/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png differ

$ ls -l /var/discourse/shared/standalone/uploads/default/35/4608d96d1b27846f.png
-rw-r--r-- 1 chris www-data 150319 Jan 19 01:14 /var/discourse/shared/standalone/uploads/default/35/4608d96d1b27846f.png

$ ls -l /var/discourse/shared/standalone/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png
-rw-r--r-- 1 chris chris 95005 Jul  3 15:25 /var/discourse/shared/standalone/uploads/default/original/2X/0/01bb9fb7e29c2b65fd663cdc58705d1720f8fea7.png

And, of course, the million-dollar question remains: How should I go about migrating /uploads/default/35/4608d96d1b27846f.png to the new upload scheme?

1 Like

It looks like your uploads are not migrated to the new scheme properly. SiteSetting.migrate_to_new_scheme = true change itself should take care of this situation. I’m not sure about why it doesn’t happen in your case. Please check the number uploads which is not migrated to the new scheme. Run the below commands to find the results.

Upload.by_users.where("url NOT LIKE '%/original/_X/%' AND url LIKE '%/uploads/default%'").count
SiteSetting.migrate_to_new_scheme = true
Jobs::MigrateUploadScheme.new.execute(nil)
Upload.by_users.where("url NOT LIKE '%/original/_X/%' AND url LIKE '%/uploads/default%'").count

No, these are not custom fields. You should get custom field values by the command post.custom_fields.

3 Likes

Oh, sorry! I completely misunderstood what you meant by custom fields.

I could be wrong, but it doesn’t look like that particular post has any:

[1] pry(main)> Post.find_by(:id => 43).custom_fields
=> {}

Well, this is interesting…

[2] pry(main)> Upload.by_users.where("url NOT LIKE '%/original/_X/%' AND url LIKE '%/uploads/default%'").count
=> 0
[3] pry(main)> Post.find_by(:id => 43).image_url
=> "https://{{SITE FQDN}}/uploads/default/35/4608d96d1b27846f.png"

It would appear that there are no results matching the query you provided. Is this expected behavior?

Also:

Any idea what the answer to this question might be?

2 Likes

It looks like your uploads are already migrated to the new scheme. But the posts are not remapped to the new scheme URLs properly. Those posts are in a messed state now. Can I have the credentials of your site in a PM? So I can investigate it further when I am free.

2 Likes

Ah, that’s what I was afraid of. Unfortunately, this is a private installation and I’m not sure whether I have permission to give (root) access to the server to an outside party. I don’t suppose there’s anything else I can do to troubleshoot the issue?

I’ve had missing images since around the end of April, however I’ve only tonight just started looking into it.

rake posts:missing_uploads
26766 post uploads are missing.

22693 uploads are missing.
22683 of 22693 are old scheme uploads.
9352 of 535188 posts are affected.

We’re on stable… considering the latest posts in this thread I’m not sure what to do next here.

Edit: I chose one particular gif file and found it to be absent from the uploads directory on the live server, it is however present in the tombstone directory on my backup server. I scp’d this file from my backup server (discourse/shared/standalone/uploads/tombstone/default/39/ee8670816301d4c4.gif) to the matching tombstone directory on the live server and then re-ran the above rake task as a test.

The image is now showing on the post in question and the overall numbers have now dropped to:

26750 post uploads are missing.

22692 uploads are missing.
22682 of 22692 are old scheme uploads.
9336 of 535190 posts are affected.

It seems tombstone on the live server is 138MB and on the backup server it is 9.5GB so I will push an rsync of this dir and re-run the rake task again in the hope that it will further reduce the reported counts.

2 Likes

@kansaichris it looks like you have only 3 posts with missing post uploads. In that case, you should manually edit the post raw with the correct upload URL.

@skl you have lots of old scheme uploads. After copying the tombstone uploads from the backup server you should run the below commands to migrate it to the new scheme.

rake posts:missing_uploads    # make sure the `rsync` copied the files
rake uploads:recover          # if there any missing uploads after rsync
SiteSetting.migrate_to_new_scheme = true
Jobs::MigrateUploadScheme.new.execute(nil)
Upload.by_users.where("url NOT LIKE '%/original/_X/%' AND url LIKE '%/uploads/default%'").count
# make sure the count is 0
rake posts:missing_uploads    # check the stat again
3 Likes

Thank you for the help, @vinothkannans. I followed your instructions (took about 12 hours to execute) and the numbers have dropped somewhat:

22614 post uploads are missing.
19830 uploads are missing.
19821 of 19830 are old scheme uploads.
7339 of 535224 posts are affected.

As there are still missing uploads, I looked outside of the tombstone folder and found uploads/default on the live server is showing 22,885 empty directories (on the backup server it is 10 empty dirs). There is also a +10GB size difference on backup so I’m going to rsync uploads/default now from backup to live and then execute your instructions again.

Edit: rake posts:missing_uploads seems to be a CPU-bound single-threaded task that has been running for over 30 hours so I’ve rescaled the server onto a dedicated CPU instance temporarily. Images appear to have returned for the time being, albeit in the old scheme, so presumably some Discourse update had caused the original deletion in the first place.

1 Like

Hmm…if there really are only 3 posts with missing post uploads, why do there appear to be 135 posts whose raw text uses the old upload scheme even though the cooked text uses the new upload scheme?

[1] pry(main)> Post.where("raw ~* :regex AND cooked !~* :regex", regex: '/uploads/default/[0-9]+/').count
=> 135

Because of the upload URL scheme mismatches in raw and cooked columns. posts:missing_uploads rake task will check the uploads against the cooked column only. Somehow you have to fix those mismatched upload URLs. I’m unable to help you without looking into the DB.

:crossed_fingers:

4 Likes

Ah, okay, I didn’t realize the posts:missing_uploads task only checks the cooked column—that would definitely explain the discrepancy. :+1:

Is it fair to say that the migration process started by setting SiteSetting.migrate_to_new_scheme equal to true only checks the value of the cooked column, as well?

2 Likes

The new scheme migration will replace URLs in both raw and cooked. But it doesn’t happened in your case.

I think so. You can check last_updated_at column of the affected posts too.

2 Likes

The task aborted with an error and shows the same error on each attempt:

[2019-07-26T09:18:56.829375 #572]  WARN -- : Badly formed IFD: undefined method `map' for nil:NilClass
....rake aborted!
ArgumentError: negative length -2 given
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/exifr-1.3.6/lib/exifr/jpeg.rb:89:in `readframe'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/exifr-1.3.6/lib/exifr/jpeg.rb:116:in `examine'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/exifr-1.3.6/lib/exifr/jpeg.rb:34:in `block in initialize'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/exifr-1.3.6/lib/exifr/jpeg.rb:34:in `open'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/exifr-1.3.6/lib/exifr/jpeg.rb:34:in `initialize'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/discourse_image_optim-0.26.2/lib/image_optim/worker/jhead.rb:40:in `new'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/discourse_image_optim-0.26.2/lib/image_optim/worker/jhead.rb:40:in `oriented?'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/discourse_image_optim-0.26.2/lib/image_optim/worker/jhead.rb:27:in `optimize'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/discourse_image_optim-0.26.2/lib/image_optim.rb:122:in `block (5 levels) in optimize_image'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/discourse_image_optim-0.26.2/lib/image_optim/handler.rb:41:in `process'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/discourse_image_optim-0.26.2/lib/image_optim.rb:122:in `block (4 levels) in optimize_image'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/discourse_image_optim-0.26.2/lib/image_optim.rb:120:in `each'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/discourse_image_optim-0.26.2/lib/image_optim.rb:120:in `block (3 levels) in optimize_image'
/var/www/discourse/vendor/bundle/ruby/2.6.0/gems/discourse_image_optim-0.26.2/lib/image_optim.rb:247:in `block in with_timeout'
Tasks: TOP => posts:missing_uploads

Are you on latest or some older version?

Latest stable “2.3.2 +4”

You’ll need to be on latest beta most likely.

If you’re self hosting there isn’t much reason to be on stable. It’s actually harder to support.

The client has specifically requested that they remain on the stable release. They were adamant about this when I inherited the project.