Download images for oneboxes as well, if download images is set

(mountain) #16

If local-caching is not available right now for this, I think you’re right. Or perhaps disable all media and allow text-only.

As for an admin-user who has SSL but insists on disabling local caching for all images (OneBox or otherwise), that is out of purview.

EDIT: If Jeff is saying all OneBox content is written on the fly per pageload, then what is actually shown for clients that do not have javascript set?

(Kane York) #17

I just noticed an easy solution for part of this. Imgur supports https, and it’s big enough that rewriting all links to be https is probably worth the effort.

(mountain) #18

Yes, agreed. That is in relation to the idea of supporting any other OneBox sites that have https available. I know there’s also a few places that list all official https-carrying sites so browsers know to auto-ask for https first. Hooking into that may help if a user links media from a https site yet accidentally used http by mistake for whatever reason.

(Allen - Watchman Monitoring) #19

I love the idea of rewriting known URLs to https… but it seems to me that importing all the myriad of images, scripts, etc, which might be referenced in a onebox’d URL is a lot of stuff being imported.

(Erik Swanson) #20

Would it be reasonable/doable to add a knob that disables oneboxing in all cases except where it can be guaranteed to not cause a mixed-content warning?

(Gerhard Schlager) #21

The mixed content warnings were bugging me, so I looked at the code in order to fix this. Unfortunately there’s more than one reason why images within oneboxes aren’t downloaded. :frowning:

I’ll at least try to document my findings:

  1. As @zogstrip mentioned images inside oneboxes should be ignored. But they aren’t since the doc.css(".onebox-result img") doesn’t do anything. Is the onebox-result class still used? I saw only onebox and onebox-body when I included Github and whitelisted oneboxes.

  2. Discourse downloads all the images embedded in oneboxes, but many of the embedded images have no recognizable file extension.:

  • Gravatar used at Stackoverflow:
  • Avatar used at Github:
  • Discourse looks only at the file extension and since it doesn’t find a known image extension, it stores it as simple upload, not as image.
  1. If we are lucky and the image embedded in the onebox has a valid image file extension, we run into the last problem. Image URLs are only replaced within the post’s raw text, but the onebox images aren’t there. They are stored in cooked.

Links to Amazon cause a TLS mixed content warning
(Dan Dascalescu) #22

I have a similar problem in this post. It links to

which unfortunately redirects to the http:// version (they’re sorry for destroying the padlock on this Meta page…). The image is, though, available via https:

Would it be possible for the Onebox code to check if the image is also available via HTTPS, and if so, change the URL to HTTPS? No local downloading necessary.

(Ryan Mulligan) #23

Another possible solution would be to add a feature to GitHub - discourse/onebox: A gem for turning URLs into website previews to have a second whitelist like onebox/whitelisted_generic_onebox.rb at 34cd201b72b3c6fc8058267796424f245bda0f45 · discourse/onebox · GitHub that supports only sites that have HTTPS properly configured.

Would a PR for that feature be welcome?

(Pat David) #24

I notice that this still hasn’t been addressed (at least as of v1.7.0.beta7 +27).

Are there still any plans for local-caching assets that may be called by a onebox?

Right now the behavior is fine while browsing the topics until you stumble across a post that has mixed-content due to a onebox. The bigger problem is that once you’ve hit that post all subsequent browsing in that session will cease to show the green padlock.

(Ryan Mulligan) #25

That’s interesting that the tab maintains the mixed content warning!

I think there are actually plans to make this worse, because

:sparkles: New in Discourse 1.7 beta! All links will now be oneboxed and prettified, if you paste them on a line by themselves:

I asked if there were any plans to fix it:

any corresponding improvements to the mixed content SSL warnings caused by oneboxing?

and the response was

unfortunately we do not control all websites on the Internet :wink:

(Sam Saffron) #26

We should proxy all one box content locally where possible, I totally support a PR to improve this. Further more I support a PR for a site setting that disables unsafe embeds (it should be default on)

(Vinoth Kannan) #27

@kam44 is sponsoring me to contribute in this #pr-welcome. And he suggested to have a site setting download onebox images to local to enable this feature and store the thumbnail of the image instead of original.

How the proxy should work? Please explain me little bit more.

(Sam Saffron) #28

Sorry I meant download locally, used the term proxy very loosely

I thought @techAPJ was looking at this, so @techapj can you give @vinothkannans an all clear?

(Arpit Jalan) #29

Sure, go ahead @vinothkannans. It’s not on my to-do list. :slight_smile:

(Vinoth Kannan) #30

Created the pull request

Google photos images not downloaded
(Vinoth Kannan) #31

Onebox images are inside cooked column. Not in raw. Should I add another line here like below @sam?

next if QueuedPost.where("cooked LIKE '%#{upload.sha1}%'").exists?

Or can I replace it with raw's line?

(Sam Saffron) #32

interesting, perhaps this should not be checking raw in the first place and only check cooked?

(Vinoth Kannan) #34

Am in big confusion. Actually we are not skipping downloaded images in scheduled job clean_up_uploads sql query. Then how it is working? Is it working really?

(Régis Hanol) #35

That’s because of this method :wink:

Every time we cook a post, we synchronise records of all the uploads in the PostUpload table.
If an upload exists and isn’t in the PostUpload table, then it’s probably an orphan :wink:

(Vinoth Kannan) #36

Thank you @zogstrip It solves my confusion.

Can you please tell what will happen if a image tag exist in post.cooked but not in post.raw. Because downloaded onebox images will be in cooked post only. I see in below code we using only post.raw to check.

@doc = Nokogiri::HTML::fragment(analyzer.cook(post.raw, @cooking_options))