Rebake with rails command or rake task doesn't work, but rebuilding HTML does. Why?

Hi!

I’m trying to repair posts with broken images, imported and containing BBcode.

The images show up in the composer previews, but the final post content contains broken images:

after using the “rebuild HTML” feature on some posts and seeing that it repaired the post on my production forum, I rebaked all my posts using the rake task.

I was surprised to see that the posts with the broken image were NOT repaired.

So I experimented a rebake on a specific post on my test forum (same backup ; same data) with both rails command and rake task, and here is the behavior:

  • The images show up for a second, but the post quickly comes back to its initial state with broken images.

However, if I use the “rebuild HTML” feature, it works perfectly, and the pictures don’t come back to broken images. They are even correctly uploaded to the server after a few minutes.

So, can someone explain this phenomenon? Why rebake from rails or a rake task has this behavior, and what are the difference between rebuilding HTML and a command line rebake?

Video captures:

  1. from the rails console:

  2. from the rake task:

I’m very intrigued (and still trying to repair my images in all my posts in batch).


An example where I used Rebuild HTML that shows that this post’s embedded images were properly displayed and automatically uploaded to the server (obviously, their original link, leading to casimages, is still here, but it’s the expected behavior), days ago: Frensh Vw Bus CHERIZET 2019 SK - #13 par buggyderby - Vos sorties - VW Camper

4 Likes

I think that rake task marks them for rebake and also triggers rebuilding thumbnails. Have you checked sidekiq to see if stuff is queued?

3 Likes

edit:

A rake rebake task on the post triggers a PullHotLinkedImages in 4 minutes and also increases instantly the amount of processed task by one, but I couldn’t see anything added to the queue tab.

The few posts on which I did a manual HTML rebuild have been had their images perfectly displayed for days now (they are downloaded on my server as well).

4 Likes

I’m afraid I don’t know why it’s working differently from the admin wrench versus the console, but I found this topic with a similar issue, and they got it to work by rebaking using the API:

https://meta.discourse.org/t/some-linked-images-not-displaying-show-as-broken/142177/7

Not sure if that’s any use, but thought I’d share.:slightly_smiling_face:

Edit: I should’ve read one post further. Apparently that’s also unreliable. Sorry, my bad. False alarm.

3 Likes

10 posts were split to a new topic: Cannot PullHotlinkedImages for some domains

Besides Jay who had a glimpse of a possible difference between rake/rails rebake and rebuild HTML, does anyone have another idea?

An official reply about the difference(s) between these tasks would be welcome :slight_smile:

If we can’t figure it out, I’ll start the API way to “rebuild HTML” of my 40000 posts containing potential issues with images… And hope it will work for me :confused: :person_shrugging:

Or if there is any other way to “rebuild HTML” using rails, maybe? :thinking:

Rebuild HTML: post.rebake!(invalidate_oneboxes: true, invalidate_broken_images: true)

Rake posts:rebake: post.rebake!(**opts) where opts is generally empty
For the Oneboxes you can try task posts:refresh_oneboxes and for the broken images you can try task posts:invalidate_broken_images. The latter might be the solution to your problem.

8 Likes

After a test on a few posts, it seems to works like a charm!
I’ll try on on thousands of posts and see how it goes! Thank you very much!

5 Likes

So, here’s where I am:

After trying to do a post.rebake!(invalidate_broken_images: true) on all my 40000 posts that contains the string [img], it worked for a lot of images… But far from all, despite being hosted on the same external image hosting service.
For example, I have thousands of “working” casimages links (that links to valid images, and show images in the composer preview on edit), broken in the cooked version of the posts, that were properly displayed and uploaded on the server thanks to my script, but I also have a lot of other ones where it simply didn’t, and I don’t know why.

Post.where('raw LIKE ?', '%[img]%').find_each do |p|
    p.rebake!(invalidate_broken_images: true)
end

I also have images links from other image hosting that were uploaded, and some on which it didn’t work.

I failed to see any difference between these posts and image links. They all had working images, and the fact that they used the same images hosting puzzled me.

I tried the operation multiple times and the results were inconsistent, regardless of the external hosting services… Some images were uploaded, some weren’t. It looked like of random.

It reminds me a bit of the issue that encountered @Amethi: Some linked images not displaying/show as broken - #8 by Amethi where it worked only on some images only without any explanation.


:information_source: I’ll talk only about casimages here though my imported forum used various other image hosters.

So, I thought that maybe casimages temporarily blacklisted my IP if I tried to retrieve too many images from their servers. That could explain both the fact that it didn’t work for all images and the randomness of the success of uploading the images from my server.
There were even cases where the Rebuild HTML option worked -at first only-, the images were then displayed instead of showing a broken image icon, though there were still hosted on their external hosting service, but when the pull external image Sidekiq task was triggered it broke the images.
Same by using rail scripts with rebake!(invalidate_broken_images: true)
:weary:

So, I’m currently trying a slower approach, where I wait 5 seconds between each of my rail rebake! commands:

total = Post.where('lower(raw) LIKE ?', '%[img]https:%').count
i = 0
Post.where('raw LIKE ?', '%[img]https:%').find_each do |p|
    p.rebake!(invalidate_broken_images: true)
    print "#{i}/#{total}"
    print "\r"
    i +=1
    sleep(5)
end

I’ll see in ~60 hours if it went better…

I’d like to understand the fundamentals of my issue here and why a “normal” rebake can’t upload an image on the server (if I’m not temporarily blacklisted by casimages).

Note that this time, the certificate of casimages’s server seems OK :smile:

I also don’t understand what invalidate_broken_images really does. I’m not very familiar with Discourse’s code.

I look at the code to see the occurrence of invalidage_broken_images and saw these files:

Why it is searching specifically for the <img string? My posts are from an imported phpBB and the raw version contains only [img] bbCode, not <img> tags; so how it would have an effect (and it did, see my previous message) on my posts? :thinking:

I also don’t really understand the difference between these two methods (?):

It seems to tell that rebake set the default arguments to false, and that rebake! sets the default argument to true.

How are these two related (I’m aware of the purpose of the ! character in ruby by the way), and why are they in different files?

My goal is only to understand why my external images are sometimes uploaded, sometimes not, and if I can find a reliable way to upload them properly and automatically, even if it implies uploading an image every hour. :sweat_smile:
I’ve been almost two weeks on this and it’s driving me (and the people which I migrated their server for) crazy. :woozy_face:

Also, there is nothing in Discourse’s log, instead of multiple Sidekiq is consuming too much memory (using: 592.25M). Note that I’m working on Ubuntu via WSL on Windows 10, but I intend to use a working solution (if I find one…) on our VPS.

1 Like

It’s further down in there where you see what it does down in line 716. It deletes those images so that it will try again to download them. (at first glance anyway)

1 Like

Thank you for this explanation. :slight_smile:


So, I’m almost at my 55 hours of rebaking my posts containing [img] with a 5-second delay between each iteration of my 40000 posts from my rails script.

From what I see, it works well better than before. Most valid images (I exclude Imageshack and its erratic behavior) seem to be uploaded to my forum flawlessly at first glance at least, but I’ll have a deeper look to be 100% sure. What is 100% sure is that the results are way, way better and consistent.

So I suspect the issue I encountered (and maybe the issue from @Amethi) which was randomness in the remote image downloading with invalidate_broken_images was related to a kind of rate limit from various image hosting providers…? :thinking: The weird thing is that I didn’t notice any issue with my other imported forums… :face_with_raised_eyebrow:


That said, if the results are satisfactory enough and the delay really improves the remote image downloading, I’ll do the same method on my production forum, but I’ll increase the time between each rebaked post from 5 sec to 10 or 15 seconds (or even maybe more, we’re not in a hurry, these are all fairly old posts, and the VPS has way lower specs than my own computer).

I do not want to be conclusive too fast, but the solution to my initial problem could be to apply both the solution proposed by Richard AND add a delay between each post rebake.

3 Likes

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.