Rebaking posts does not fix broken locally downloaded linked images

Not entirely sure this is a bug, still…

My situation is the following:

  • I imported an old SMF2 website in Discourse. Some of the posts had a lot of images linked, and in order to avoid loosing them in the future if links are broken, I enabled the “download remote images to local” feature in Discourse settings.
  • A few downloads of such images failed, as the NASA server were not available at the moment of the image retrieval, leaving the baked version of that post with broken links to images (i.e. showing the “chain” square symbol), as well as the raw version of the post still showing the link to the original source.
  • The problem with the source server was temporary, as if I edit the post with broken images I can clearly see them in the WYSIWYG area to the right hand side of the composer, and also because I can open the links to the images just copy/pasting their URL in a different tab of the browser, and they do work perfectly.
  • If I save and rebake the post, the problem is not solved, and the links remain broken.

This is one of the broken images

So I thought that there must have been a reason why the rebake was failing to retrieve now available linked images, I mean a place where this “broken link” information was stored, and I found that in table post_custom_fields.
I found out that all the pictures the rebake process was unable to retrieve were present in records in this table, specifically records having column name = ‘broken_images’.
If these records were deleted manually from the PostgreSQL database, then those posts rebaked, I was able to finally have the images back (although sometimes more than one try was necessary).

This is a good workaround, but risky, as I don’t feel safe manually hacking the database.

Wouldn’t make a lot more sense if the rebake process could delete all the entries where name = ‘broken_images’ related to a rebaking post?

Or am I missing something obvious?

2 Likes

I agree with this, hard caching image is broken forever is a bit too strict here, at a minimum we should expire that information after N days.

I think rebakes should try again for sure.

@vinothkannans what are your thoughts here?

4 Likes

I agree, rebake should retry to download all broken images.

I think it’s not needed after we implemented the both “retry on rebake” and “retry broken images N times in X hours” features. We can extend the X hours to 1, 10, 24, 72, 168.

9 Likes

Is redownload broken going to be a job we can manually trigger?

Will there be an upper limit on the number of attempts?

In future you need to rebake / “Rebuild HTML” to retry broken images download. It won’t have any upper limit.

5 Likes

This is already done as per commit below

https://github.com/discourse/discourse/commit/2b006c0429e25d25b9cae6bf95fe9d98f3f8b55d

6 Likes