Download images for oneboxes as well, if download images is set

(Gerhard Schlager) #21

The mixed content warnings were bugging me, so I looked at the code in order to fix this. Unfortunately there’s more than one reason why images within oneboxes aren’t downloaded. :frowning:

I’ll at least try to document my findings:

  1. As @zogstrip mentioned images inside oneboxes should be ignored. But they aren’t since the doc.css(".onebox-result img") doesn’t do anything. Is the onebox-result class still used? I saw only onebox and onebox-body when I included Github and whitelisted oneboxes.

  2. Discourse downloads all the images embedded in oneboxes, but many of the embedded images have no recognizable file extension.:

  • Gravatar used at Stackoverflow:
  • Avatar used at Github:
  • Discourse looks only at the file extension and since it doesn’t find a known image extension, it stores it as simple upload, not as image.
  1. If we are lucky and the image embedded in the onebox has a valid image file extension, we run into the last problem. Image URLs are only replaced within the post’s raw text, but the onebox images aren’t there. They are stored in cooked.

Links to Amazon cause a TLS mixed content warning
(Dan Dascalescu) #22

I have a similar problem in this post. It links to

which unfortunately redirects to the http:// version (they’re sorry for destroying the padlock on this Meta page…). The image is, though, available via https:

Would it be possible for the Onebox code to check if the image is also available via HTTPS, and if so, change the URL to HTTPS? No local downloading necessary.

(Ryan Mulligan) #23

Another possible solution would be to add a feature to GitHub - discourse/onebox: A gem for turning URLs into website previews to have a second whitelist like onebox/whitelisted_generic_onebox.rb at 34cd201b72b3c6fc8058267796424f245bda0f45 · discourse/onebox · GitHub that supports only sites that have HTTPS properly configured.

Would a PR for that feature be welcome?

(Pat David) #24

I notice that this still hasn’t been addressed (at least as of v1.7.0.beta7 +27).

Are there still any plans for local-caching assets that may be called by a onebox?

Right now the behavior is fine while browsing the topics until you stumble across a post that has mixed-content due to a onebox. The bigger problem is that once you’ve hit that post all subsequent browsing in that session will cease to show the green padlock.

(Ryan Mulligan) #25

That’s interesting that the tab maintains the mixed content warning!

I think there are actually plans to make this worse, because

:sparkles: New in Discourse 1.7 beta! All links will now be oneboxed and prettified, if you paste them on a line by themselves:

I asked if there were any plans to fix it:

any corresponding improvements to the mixed content SSL warnings caused by oneboxing?

and the response was

unfortunately we do not control all websites on the Internet :wink:

(Sam Saffron) #26

We should proxy all one box content locally where possible, I totally support a PR to improve this. Further more I support a PR for a site setting that disables unsafe embeds (it should be default on)

(Vinoth Kannan) #27

@kam44 is sponsoring me to contribute in this #pr-welcome. And he suggested to have a site setting download onebox images to local to enable this feature and store the thumbnail of the image instead of original.

How the proxy should work? Please explain me little bit more.

(Sam Saffron) #28

Sorry I meant download locally, used the term proxy very loosely

I thought @techAPJ was looking at this, so @techapj can you give @vinothkannans an all clear?

(Arpit Jalan) #29

Sure, go ahead @vinothkannans. It’s not on my to-do list. :slight_smile:

(Vinoth Kannan) #30

Created the pull request

Google photos images not downloaded
(Vinoth Kannan) #31

Onebox images are inside cooked column. Not in raw. Should I add another line here like below @sam?

next if QueuedPost.where("cooked LIKE '%#{upload.sha1}%'").exists?

Or can I replace it with raw's line?

(Sam Saffron) #32

interesting, perhaps this should not be checking raw in the first place and only check cooked?

(Vinoth Kannan) #34

Am in big confusion. Actually we are not skipping downloaded images in scheduled job clean_up_uploads sql query. Then how it is working? Is it working really?

(Régis Hanol) #35

That’s because of this method :wink:

Every time we cook a post, we synchronise records of all the uploads in the PostUpload table.
If an upload exists and isn’t in the PostUpload table, then it’s probably an orphan :wink:

(Vinoth Kannan) #36

Thank you @zogstrip It solves my confusion.

Can you please tell what will happen if a image tag exist in post.cooked but not in post.raw. Because downloaded onebox images will be in cooked post only. I see in below code we using only post.raw to check.

@doc = Nokogiri::HTML::fragment(analyzer.cook(post.raw, @cooking_options))

(Régis Hanol) #37

No we’re not :wink:

We’re cooking the post via the analyzer.cook method.

(Vinoth Kannan) #38

Then don’t merge my PR. Else all downloaded onebox images will become orphans :sob:.

Then there is no way to protect downloaded onebox images?

(Régis Hanol) #39

Have you tried just removing this line? This should get you 40% there.

(Vinoth Kannan) #40

This method will remove the onebox images from @doc only. And @doc have analyzer cooked html.

Now the problem is I am downloading remote onebox images to local and replacing the url in img tag from post.cooked. Not in post.raw since raw post will only have the remote onebox url (not images). So post.cooked have new local image url. But you are generating @doc using analyzer.cook(post.raw, @cooking_options) method. Looks like you are recooking here (not using post.cooked where I done changes). New cooked html will not have image urls which I replaced. Am I right?

(Régis Hanol) #41

Yeah, that’s why this feature is hard. It’s an egg and chicken problem.

You’ll have to add another step to the CookedPostProcessor that will look for oneboxed images in the cooked version of the post, download them locally, replace their URL in the cooked post and ensure the reverse index (ie. PostUpload) is kept up to date.