Thumbnail generation & markdown rendering issue

Hi,

Since the latest Discourse 2.5.0 version, I have trouble with thumbnails generation.
It seems thumbnails are now generated from core, and it resulted in all thumbnails being wiped.
I tried various operations to put them back and I’m going to explain situations where it doesn’t work.

Note: there is probably a good reason for that change, but that would be really welcomed to not introduce sudden behavior change breaking functionality without having a way to be prepared, with an upgrade guide and/or a way to opt in this change, please. :confounded:

Context

  • Discourse 2.5.0 beta4 (faeb5793ba)
  • Topic List Preview plugin 4.4.0
  • WP-Discourse Posts published at it is (full HTML) on Discourse topics (first message).

About a post content, this is an example (formatted for you):

Content

Image HTML as code for convenience:

<img
        width="150"
        height="84"
        src="https://zuzu.reviews/wp-content/uploads/2020/05/HiZERO-VS-BISSEL-VS-שואב-אלחוטי-VS-שואב-רובוטי-VS-מגב-ודלי-VS-מטאטא-VS-ספונגה-חשמלית-150x84.jpg"
        class="attachment-thumbnail size-thumbnail"
        alt=""
        srcset="
            https://zuzu.reviews/wp-content/uploads/2020/05/HiZERO-VS-BISSEL-VS-שואב-אלחוטי-VS-שואב-רובוטי-VS-מגב-ודלי-VS-מטאטא-VS-ספונגה-חשמלית-150x84.jpg     150w,
            https://zuzu.reviews/wp-content/uploads/2020/05/HiZERO-VS-BISSEL-VS-שואב-אלחוטי-VS-שואב-רובוטי-VS-מגב-ודלי-VS-מטאטא-VS-ספונגה-חשמלית-300x169.jpg    300w,
            https://zuzu.reviews/wp-content/uploads/2020/05/HiZERO-VS-BISSEL-VS-שואב-אלחוטי-VS-שואב-רובוטי-VS-מגב-ודלי-VS-מטאטא-VS-ספונגה-חשמלית-1200x675.jpg  1200w,
            https://zuzu.reviews/wp-content/uploads/2020/05/HiZERO-VS-BISSEL-VS-שואב-אלחוטי-VS-שואב-רובוטי-VS-מגב-ודלי-VS-מטאטא-VS-ספונגה-חשמלית-1536x864.jpg  1536w,
            https://zuzu.reviews/wp-content/uploads/2020/05/HiZERO-VS-BISSEL-VS-שואב-אלחוטי-VS-שואב-רובוטי-VS-מגב-ודלי-VS-מטאטא-VS-ספונגה-חשמלית-2048x1152.jpg 2048w,
            https://zuzu.reviews/wp-content/uploads/2020/05/HiZERO-VS-BISSEL-VS-שואב-אלחוטי-VS-שואב-רובוטי-VS-מגב-ודלי-VS-מטאטא-VS-ספונגה-חשמלית-788x443.jpg    788w,
            https://zuzu.reviews/wp-content/uploads/2020/05/HiZERO-VS-BISSEL-VS-שואב-אלחוטי-VS-שואב-רובוטי-VS-מגב-ודלי-VS-מטאטא-VS-ספונגה-חשמלית.jpg           1280w
        "
        sizes="(max-width: 150px) 100vw, 150px"
    />
<div data-wp>
    <a
        href="https://www.banggood.com/Xiaomi-Redmi-Router-AC2100-2033Mbps-2_4G-5G-Dual-Band-Wireless-Router-6High-Gain-Antennas-128MB-OpenWRT-WiFi-Router-p-1614038.html"
        target="_blank"
        ><img src="https://zuzu.deals/wp-content/uploads/2020/01/5e3128b4e5da7-150x150.jpg"/>
    </a>
    <div>
        <div data-buy>
            <a
                href="https://www.banggood.com/Xiaomi-Redmi-Router-AC2100-2033Mbps-2_4G-5G-Dual-Band-Wireless-Router-6High-Gain-Antennas-128MB-OpenWRT-WiFi-Router-p-1614038.html"
                target="_blank">קנייה
            </a>
            <span data-clipboard-text="BG38b2ac" data-coupon>BG38b2ac</span><i></i>
        </div>
        <div data-price>$43.99</div>
    </div>
</div>
<hr />
<p>
    <small>
        &nbsp;פורסם ב:&nbsp;<a href="https://zuzu.deals/%d7%a7%d7%95%d7%a4%d7%95%d7%9f-%d7%91%d7%9c%d7%a2%d7%93%d7%99-%d7%a8%d7%90%d7%95%d7%98%d7%a8-%d7%97%d7%96%d7%a7-%d7%95%d7%97%d7%93%d7%a9-%d7%a9%d7%9c-%d7%a9%d7%99%d7%90%d7%95%d7%9e%d7%99-xiaomi-re-2/"></a>
    </small>
</p>
<br />
<p>נעים להכיר!</p>

Before Discourse update

TLP worked always well on our external images, whatever on topic creation or edition.
However, we had the download remote images to local option disabled because of markdown issue.

Markdown rendering issue

This not the main issue, and it still happens after Discourse upgrade, here some explanation.
When discourse downloads and replaces an image HTML with its markdown syntax – in our context, this results as:

[...]<a href="<link_here>" target="_blank">![|150x150](upload://l0iarnA6SPVAyJN5l7pnQxZnPvE.jpeg)</a>[...]

Discourse is unable to render the image

Image

image

To fix the issue, you need at least an empty line above:

[...]<a href="<link_here>" target="_blank">

![|150x150](upload://l0iarnA6SPVAyJN5l7pnQxZnPvE.jpeg)</a>[...]
Image

image

Would it be possible to allow rendering markdown image surrounded by HTML, please?

After Discourse update

Settings

From this, I was told to rebake all posts to get the image downloaded. And it’s where it gets weird.

  1. rake posts:rebake did not have much effect (at least not on the first message of a topic, but it did trigger a lot of PullHotlinkedImages)
    1.1. Looking at some topics, I thought that the image HTML with class or srcset attributes were the cause, so I tried to normalize all the images with the following code (don’t know ruby) – It helped for some topics.
    1.2. However due to the markdown issue, I had to add newlines to fix. – At least for those topics, thumbnail worked.
Code
Post.where(post_number: 1)
    .where("raw LIKE '%<img%/>%'")
    .each 
        do |post|
            post.raw.gsub!(/<img[^>]+(src="[^"]+")[^>]+\/>/, "<img \\1 />")
            post.save!(validate: true)
            post.rebake!  
        end

Post
  .where(post_number: 1)
  .where("raw LIKE '%upload://%'").each 
     do |post|  
       post.raw.gsub!(/(!\[.*upload:\/\/.*\))/, "\n\n\\1") 
       post.save!(validate: false); 
       post.rebake! 
     end
  1. Editing manually (without changing content, just saving) an ignored topic works most of the time. Image is downloaded.
    2.1. Some topic are ignored even after an edit. I can see PullHotlinkedImages is triggered but no images are downloaded. (like with this html <img src="https://zuzu.reviews/wp-content/uploads/2020/05/HiZERO-VS-BISSEL-VS-שואב-אלחוטי-VS-שואב-רובוטי-VS-מגב-ודלי-VS-מטאטא-VS-ספונגה-חשמלית-150x84.jpg" />, link)
  2. After normalizing, and seeing that an edit can help to download the image, I tried to rake posts:rebake multiple times – without effects.
  3. Then I tried to use the rails console similar to the code above but with specific topic id and with only post.rebake!without effects
  4. Not all images from a topic are always downloaded…
  5. @Canapin points me to Download remote images from older posts? ; trying to rebake all posts now. – Did not help unfortunately

It’s just crazy why some images works, others not. I don’t think it’s a criteria issue. Image settings are high. I really don’t understand what is the logic behind, this seems random.

Currently, we have still a lot of missing thumbnails. Likely most of them can be fixed manually with editing/saving, but that’s not feasible. I’m doing that for a client, and I have lost already a lot of time trying to fix the issue.

I don’t mind having upload images as thumbnail, but:

  • Can you tell me if there are specific reasons why images won’t download? Are there settings which can help? Do we need something? How to debug?
  • Is there a way to force download them similar to editing/saving through the console?
  • Can you allow markdown image rendering if surrounded by HTML?

Hopefully I was enough precise in the problem description.

Thanks in advance for any help and solution.

2 Likes

Hi @Arkshine, sorry to hear you’re having issues here. Unfortunately we can’t guarantee compatibility with all third-party plugins, especially where they override core behaviour beyond the scope of our plugin APIs. Going forward, TLP should able to reduce the number of core overrides it uses, and hopefully stability will improve.

Note that we now also have our official solution to displaying thumbnails, which is implemented entirely using our supported plugin APIs.

You are correct that core thumbnails only work for local images. Ultimately, TLP was pulling images and storing them locally as well, so the difference now is that we’re doing everything in a consistent way using the pull_hotlinked_images job.

It sounds like we have a couple issues to resolve here. I think it would be best to split them out so that we don’t miss anything. Reading through your post, I see two things:

Pulling remote images shouldn’t result in invalid html/markdown, so we will try and get this fixed. Please go ahead and open a #bug topic, mention me, and we’ll look into it.

Again, special characters shouldn’t break the images here. Please open another #bug topic and we’ll take a look.

4 Likes

Thanks for the response, @david.

About markdown, link with special characters, will do. Thanks.


But, ultimately, my main issue is still to make Discourse force-download the image. The markdown issue can be solved with console, and not all links have special characters. We have still a lot of topics without thumbnails. This is on this I’m asking help.

For example, this one doesn’t have special characters. And unless I hit the Save Edit, the image is not downloaded through rebaking from the console, nor using the Rebuild HTML link.

Is there some rails command or something else which can force Discourse to download the image the same way as you would Save Edit?


I’ve actually tried to use the component, but it did not work well. But will give another go once the thumbnail issues are solved.

Is it an affiliate site?

That is strange. I have one idea - are these posts created by the system user? Or Discobot?

Most of them a normal admin user. Those topics are created through the API from WP Discourse.

For example this one:

@Arkshine was there supposed to be a link there?

Sorry, I was processing with an example, while testing things, but I’ve hit the button unintentionally.

That’s said, I started to execute this to see if it would help (sorry if it hurts your eyes!)

Post
  .joins(:topic)
  .where(post_number: 1)
  .where('topics.visible = true')
  .where('topics.deleted_at IS NULL')
  .each do |post| 
     post.baked_version = nil
     post.save!(validate: false)
     post.rebake!
   end

It took some times. I’ve seen quite a lot of Jobs::PullHotlinkedImages.
But it doesn’t seem It did not help much.

If you check a specific topic, for example, this topic.

From the console, I can see it triggers Jobs::ProcessPost but not Jobs::PullHotlinkedImages

[106] pry(main)> Post.update_all(baked_version: nil)
=> 38808

[107] pry(main)> Post.where(post_number: 1, topic_id: 64215).each do |post| post.rebake!; end
=> [#<Post:0x0000557fe01f2fd8
  id: 79717,
  user_id: 3,
  topic_id: 64215,
  post_number: 1,
  raw:
   "<div data-wp><a href=\"https://www.mooki.co.il/gaming/hbilvt-giiming-mwtlmvt/mvwb-giiming-khvl-sparkfox-wvlhn-giiming-mqcvei-lumi-whvr-2\" target=\"_blank\"><img src=\"https://zuzu.deals/wp-content/uploads/2020/05/5ebcf97155cd2-150x150.png\" /></a><div><div data-buy><a href=\"https://www.mooki.co.il/gaming/hbilvt-giiming-mwtlmvt/mvwb-giiming-khvl-sparkfox-wvlhn-giiming-mqcvei-lumi-whvr-2\" target=\"_blank\">קנייה</a><span data-clipboard-text=\"GLA679\" data-coupon>GLA679</span><i></i></div><div data-price>₪679 <span data-old-price>₪1378</span></div></div></div><hr /><p><small>&nbsp;פורסם ב:&nbsp;<a href=\"https://zuzu.deals/%d7%91%d7%9c%d7%a2%d7%93%d7%99-%d7%95%d7%91%d7%9e%d7%97%d7%99%d7%a8-%d7%97%d7%98%d7%99%d7%a4%d7%94-%d7%9e%d7%95%d7%a9%d7%91-%d7%92%d7%99%d7%99%d7%9e%d7%99%d7%a0%d7%92-%d7%90%d7%93%d7%95%d7%9d-spark/\"></a></small></p><br><p><img src=\"https://www.mooki.co.il/pub/media/catalog/product/cache/0f831c1845fc143d00d6d1ebc49f446a/_/s/_sparkfox_k1_5_.png\" /></p>\n<p style=\"text-align: center;\">בין אם אתם גיימרים ובין אם אתם פשוט עובדים ויושבים כל היום והגב כבר זועק לכיסא טוב יותר, הנה לכם עוד מבצע בלעדי במחיר חטיפה!<br />\nכיסא גיימינג מפנק, אוזניות גיימינג ומשלוח מהיר בחינם, עם אחריות יבואן רשמי &#8211; רק ב679₪!!!</p>\n<p style=\"text-align: center;\">השתמשו בקופה בקופון הבלעדי &#8211; <strong>GLA679</strong></p>\n<div> <img src=\"https://zuzu.deals/wp-content/uploads/2020/05/90902-801-09_2_1.jpg\" /></div>\n<div>\n<h3 style=\"text-align: center;\">מושב גיימינג מקצועי SPARKFOX GC60P</h3>\n</div>\n<div>מושב גיימינג בעל עיצוב מיוחד למשחקי מחשב לנוחות מקסימאלית למשתמש</div>\n<div>\n<ul>\n<li>מושב בעל משענת גב גבוהה</li>\n<li>נוחות המקסימאלית למשך זמן משחק ארוך</li>\n<li>זוג כריות לתמיכה בצוואר ובגב התחתון</li>\n<li>סוג חומר: ספוג יצוק</li>\n<li>סוג מסגרת: מתכת</li>\n<li>חומר: עור עם סיבי פחם</li>\n<li>משענות ידיים: מתכווננות מעלה / מטה</li>\n<li>סוג מנגנון: פרפר</li>\n<li>סוג הרמה: הידראולית Class4</li>\n<li>טווח משענת גב: 90°-180°</li>\n<li>סוג בסיס: ניילון</li>\n<li>חומר גלגל: ניילון</li>\n<li>יכולת נשיאה: עד 150 ק”ג</li>\n<li>אחריות: שנה</li>\n</ul>\n<div><strong>מידות</strong></div>\n<div>\n<ul>\n<li>רוחב: 67 ס&quot;מ</li>\n<li>עומק: 67 ס&quot;מ</li>\n<li>גובה משתנה: 124-132 ס&quot;מ</li>\n</ul>\n<h3></h3>\n<p><img src=\"https://zuzu.deals/wp-content/uploads/2020/05/90902-802-08_3_1.jpg\" /></p>\n<h3 style=\"text-align: center;\">אוזניות גיימינג SPARKFOX K1</h3>\n<div>אוזניות גיימינג בעיצוב מיוחד לנוחות מקסימלית לשמע ודיבור וביטול רעשי רקע</div>\n<div>\n<ul>\n<li>ניתנות לשימוש ברוב הקונסולות הקיימות בשוק</li>\n<li>שמע וניהול שיחות בטלפונים ובמחשבים ניידים</li>\n<li>ווסת עוצמת השמע הינו בכבל של האוזנייה- לגישה נוחה</li>\n<li>שמע מעולה ממנהלי התקנים גדולים של 50 מ&quot;מ</li>\n<li>בקרי עוצמת הקול וההשתקה</li>\n<li>כוסות אוזניים מרופדות גדולות לנוחות מרבית</li>\n<li>קשת האוזנייה מתכווננת להתאמה מושלמת לראשכם</li>\n<li>מתחבר ישירות ליציאת בקרי 3.5 מ&quot;מ</li>\n</ul>\n</div>\n<div>מצורף מתאם מיוחד לחיבור האוזניות למחשב נייח ע&quot;י מפצל 3.5 מ&quot;מ ל 2 יציאות 3.5 מ&quot;מ</div>\n</div>\n</div>\n<p>&nbsp;</p>\n<div data-custom-html=\"\"></div>",
  cooked:
   "<div data-wp=\"\">\n<a href=\"https://www.mooki.co.il/gaming/hbilvt-giiming-mwtlmvt/mvwb-giiming-khvl-sparkfox-wvlhn-giiming-mqcvei-lumi-whvr-2\" target=\"_blank\"><img src=\"https://zuzu.deals/wp-content/uploads/2020/05/5ebcf97155cd2-150x150.png\"></a><div>\n<div data-buy=\"\">\n<a href=\"https://www.mooki.co.il/gaming/hbilvt-giiming-mwtlmvt/mvwb-giiming-khvl-sparkfox-wvlhn-giiming-mqcvei-lumi-whvr-2\" target=\"_blank\">קנייה</a><span data-clipboard-text=\"GLA679\" data-coupon=\"\">GLA679</span><i></i>\n</div>\n<div data-price=\"\">₪679 <span data-old-price=\"\">₪1378</span>\n</div>\n</div>\n</div><hr><p><small> פורסם ב: <a href=\"https://zuzu.deals/%d7%91%d7%9c%d7%a2%d7%93%d7%99-%d7%95%d7%91%d7%9e%d7%97%d7%99%d7%a8-%d7%97%d7%98%d7%99%d7%a4%d7%94-%d7%9e%d7%95%d7%a9%d7%91-%d7%92%d7%99%d7%99%d7%9e%d7%99%d7%a0%d7%92-%d7%90%d7%93%d7%95%d7%9d-spark/\"></a></small></p><br><p><img src=\"https://www.mooki.co.il/pub/media/catalog/product/cache/0f831c1845fc143d00d6d1ebc49f446a/_/s/_sparkfox_k1_5_.png\"></p>\n<p>בין אם אתם גיימרים ובין אם אתם פשוט עובדים ויושבים כל היום והגב כבר זועק לכיסא טוב יותר, הנה לכם עוד מבצע בלעדי במחיר חטיפה!<br>\nכיסא גיימינג מפנק, אוזניות גיימינג ומשלוח מהיר בחינם, עם אחריות יבואן רשמי – רק ב679₪!!!</p>\n<p>השתמשו בקופה בקופון הבלעדי – <strong>GLA679</strong></p>\n<div> <img src=\"https://zuzu.deals/wp-content/uploads/2020/05/90902-801-09_2_1.jpg\">\n</div>\n<div>\n<h3>מושב גיימינג מקצועי SPARKFOX GC60P</h3>\n</div>\n<div>מושב גיימינג בעל עיצוב מיוחד למשחקי מחשב לנוחות מקסימאלית למשתמש</div>\n<div>\n<ul>\n<li>מושב בעל משענת גב גבוהה</li>\n<li>נוחות המקסימאלית למשך זמן משחק ארוך</li>\n<li>זוג כריות לתמיכה בצוואר ובגב התחתון</li>\n<li>סוג חומר: ספוג יצוק</li>\n<li>סוג מסגרת: מתכת</li>\n<li>חומר: עור עם סיבי פחם</li>\n<li>משענות ידיים: מתכווננות מעלה / מטה</li>\n<li>סוג מנגנון: פרפר</li>\n<li>סוג הרמה: הידראולית Class4</li>\n<li>טווח משענת גב: 90°-180°</li>\n<li>סוג בסיס: ניילון</li>\n<li>חומר גלגל: ניילון</li>\n<li>יכולת נשיאה: עד 150 ק”ג</li>\n<li>אחריות: שנה</li>\n</ul>\n<div><strong>מידות</strong></div>\n<div>\n<ul>\n<li>רוחב: 67 ס\"מ</li>\n<li>עומק: 67 ס\"מ</li>\n<li>גובה משתנה: 124-132 ס\"מ</li>\n</ul>\n<h3></h3>\n<p><img src=\"https://zuzu.deals/wp-content/uploads/2020/05/90902-802-08_3_1.jpg\"></p>\n<h3>אוזניות גיימינג SPARKFOX K1</h3>\n<div>אוזניות גיימינג בעיצוב מיוחד לנוחות מקסימלית לשמע ודיבור וביטול רעשי רקע</div>\n<div>\n<ul>\n<li>ניתנות לשימוש ברוב הקונסולות הקיימות בשוק</li>\n<li>שמע וניהול שיחות בטלפונים ובמחשבים ניידים</li>\n<li>ווסת עוצמת השמע הינו בכבל של האוזנייה- לגישה נוחה</li>\n<li>שמע מעולה ממנהלי התקנים גדולים של 50 מ\"מ</li>\n<li>בקרי עוצמת הקול וההשתקה</li>\n<li>כוסות אוזניים מרופדות גדולות לנוחות מרבית</li>\n<li>קשת האוזנייה מתכווננת להתאמה מושלמת לראשכם</li>\n<li>מתחבר ישירות ליציאת בקרי 3.5 מ\"מ</li>\n</ul>\n</div>\n<div>מצורף מתאם מיוחד לחיבור האוזניות למחשב נייח ע\"י מפצל 3.5 מ\"מ ל 2 יציאות 3.5 מ\"מ</div>\n</div>\n</div>\n<p> </p>\n<div data-custom-html=\"\"></div>",
  created_at: Thu, 14 May 2020 07:55:31 UTC +00:00,
  updated_at: Tue, 26 May 2020 14:56:16 UTC +00:00,
  reply_to_post_number: nil,
  reply_count: 0,
  quote_count: 0,
  deleted_at: nil,
  off_topic_count: 0,
  like_count: 0,
  incoming_link_count: 2,
  bookmark_count: 0,
  score: 10.8,
  reads: 4,
  post_type: 1,
  sort_order: 1,
  last_editor_id: -1,
  hidden: false,
  hidden_reason_id: nil,
  notify_moderators_count: 0,
  spam_count: 0,
  illegal_count: 0,
  inappropriate_count: 0,
  last_version_at: Thu, 14 May 2020 09:19:26 UTC +00:00,
  user_deleted: false,
  reply_to_user_id: nil,
  percent_rank: 0.0,
  notify_user_count: 0,
  like_score: 0,
  deleted_by_id: nil,
  edit_reason: nil,
  word_count: 939,
  version: 3,
  cook_method: 1,
  wiki: false,
  baked_at: Tue, 26 May 2020 16:59:49 UTC +00:00,
  baked_version: 2,
  hidden_at: nil,
  self_edits: 2,
  reply_quoted: false,
  via_email: false,
  raw_email: nil,
  public_version: 3,
  action_code: nil,
  image_url: "https://zuzu.deals/wp-content/uploads/2020/05/5ebcf97155cd2-150x150.png",
  locked_by_id: nil,
  image_upload_id: nil>]

If I Save Edit from composer:

I see well

Then I can see the image is downloaded:

Ah I see. The problem here is

Because we have this logic before scheduling pull hotlinked images:

This is designed to avoid an infinite loop of pull_hotlinked_image jobs being scheduled. But maybe we need to improve this logic. Can you check the revision history for one of these posts and see why it was last edited by the system?

2 Likes

For this topic:

  1. image
  2. image (category change)
  3. image (Me Save Edit only)
  4. image (system replacing image with markdown)

So, I guess it’s due to the category change?

1 Like

Yes that would explain it! As a short term solution you can manually run the pull hotlinked images job by doing something like:

Jobs.enqueue_in(10, :pull_hotlinked_images, post_id: post.id)

We need a better solution here though. I will add it to my todo list

5 Likes

Amazing. Let me try that!

EDIT:

Just to confirm the command helped a lot. Now, most of the images are back. Thanks again!

There are still edges cases not processed like images with special characters, or topic where images are marked as broken but they aren’t (a Rebuild HTML fix them, then Save edit to download them). If #9890 is merged and a fix for markdown is made, it would likely fix everything.

2 Likes

I just merged this fix for system-user edits:

I think we’ve spun out all the other issues into separate topics, but if we missed anything please feel free to open another one @Arkshine

3 Likes

BTW I just removed this setting. We no longer check post age when pulling hotlinked images

cc @merefield

6 Likes

This topic was automatically closed after 28 hours. New replies are no longer allowed.