Download remote images to local silently failing?

Hi everybody!
Back asking support about the download of remote images after this

Let me outline my problem first. I have to make sure that images linked in about 1400 posts in my foru are correctly downloaded and imported in Discourse. This is because the website where these images are hosted is going to close soon and I don’t want, obviously, to loose the embedded images in my posts.
The website that hosts the images is owned by me and I can read the Apache logs to spot whether the images are really getting pulled from the server or not by Discourse (spoiler, they are, by process Redis).

Now, the feature of downloading remote images normally works, but somehow the download of this specific subset of images, all coming from the same website, never worked and still somehow fails silently whenever I try to rebake the posts (more below).

No errors are generated in the Discourse logs and no record for “broken_images” is created in table post_custom_fields.

Before writing I also searched around in meta without finding any particular hint I could use to understand what goes wrong. I found very useful though this post from @vinothkannans, and went carefully through his checklist.

So to answer the questions:

  1. SiteSetting download_remote_images_to_local is disabled. [ENABLED]
  2. Enough disk space not available to download. (In this case you will receive notification about the problem). Also look at the download remote images threshold SiteSetting. [I have 50% of free disk space left]
  3. If the post date is before download_remote_images_max_days_old SiteSetting. [Set to 7200 days, while the images are from posts between 2015 and 2018 - Imported from SMF2]
  4. If the image is from one of the domains of disabled image download domains SiteSetting. [No disabled domains]

Here are my settings.

In order to trigger the download of the pictures I launch the rake task

rake  posts:rebake_match['attachments_tapatalk','string',10]

where “attachments_tapatalk” is part of the URL of all the images links in the posts.

Once I launch the rebake (with 10 seconds of extra waiting time between posts to not overload the image hosting server) I can see that Redis is requesting the images to the hosting server, but then nothing gets updated in the Discourse posts, i.e. no image seems to get imported as Discourse attachment.

In Sidekiq I can see the number of “Processed” tasks increasing by the expected amount (about 1400).

This is what I see in the logs of the server hosting the images

The second and third images in this topic are, for example, coming from that server and do work. They do work (i.e. they are reachable and available) even linking them here!

http://astronautica.eu/attachments_tapatalk/20180120/2016f256f61b2995a9f73b1f99ddc32b.jpg
http://astronautica.eu/attachments_tapatalk/20180120/d5ab809851818bd2077daab3989b2ab3.jpg

So, I have these unsolved questions:

  • why I can see the linked images with no problems in the posts body, but the local download fails?
  • where shall I look for more information about what the PullHotlinkedImages/PostsProcess jobs do and what is the exit status/logging for each of them?
  • is it possible to manually launch a PullHotlinkedImages process somehow, to trigger/force the download of the linked images in a specific topic?

Thanks in advance!

3 Likes

Both “rebake” and “Rebuild HTML” actions will trigger it. To debug and get more details about this issue I’ll recommend you to do it for a single affected post at once.

Code. pull_hotlinked_images.rb and process_post.rb

Could you reproduce this issue on try?

2 Likes

Well, yes, meaning that I can restart the rebake_match at will, ad it always… fails to download the attachments. Do you have some useful test/log file inspection to suggest that I can perform?

One thing I noticed is that seemingly there is no PullHotlinkedImages process visible in Sidekiq, just the number of “Processed” tasks increasing by 1 every ten seconds (as expected, since I pass the waiting time parameter when calling the rebake_match task)

2 Likes