Downloading of .webp images in oneboxes is broken

It looks like downloading images in oneboxes is broken. Maybe a recent regression due to changes in onebox or secure media. @vinothkannans Could you take a look please?

Exampe: Oneboxing https://www.samsung.com/us/mobile/galaxy-z-flip doesn’t show an image because the images is loaded over HTTP.

12 Likes

The problem is while downloading the onebox image from URL “http://image-us.samsung.com/SamsungUS/home/samsung-logo-191-1.jpg” it’s returning the file in “.webp” format (samsung-logo-191-1.webp). So we’re unable to download it since the “.webp” file format is not whitelisted in our authorized_extensions site setting.

The favicon is not downloaded since it’s a “.ico” file. Should we allow “ico” files in oneboxes by default?

5 Likes

I wonder if webp only comes because of the use of the Chrome user agent :thinking:

I don’t think so. It’s returning a webp file even when I manually download it in Firefox.

3 Likes

On my version of Firefox (77.0.1 (64-bit) for MacOS), the Accept header in the request for the above URL is:

text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8

(ie, the browser is asking for a image/webp, if available)

Safari, on the other hand, has an Accept header of:

text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

and a jpeg is returned by the Samsung webserver. The Onebox displays correctly (with the image) in Safari.

5 Likes

Samsung is using Akamai Image Manager to serve these files.

If I pass an Accept header of image/jpeg,image/gif it returns an image/jpeg, as expected.

wget --header="Accept: image/jpeg,image/gif" http://image-us.samsung.com/SamsungUS/home/samsung-logo-191-1.jpg

--2020-11-13 11:43:58--  http://image-us.samsung.com/SamsungUS/home/samsung-logo-191-1.jpg
Resolving image-us.samsung.com (image-us.samsung.com)... 23.217.144.69
Connecting to image-us.samsung.com (image-us.samsung.com)|23.217.144.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 83094 (81K) [image/jpeg]
Saving to: ‘samsung-logo-191-1.jpg.6’

Discourse, tries to fetch the image by calling FileHelper.download, which calls FinalDestination.get, which ultimately requests the file with a UserAgent of:

Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36

So, if we request the same image with the same Accept header as above, but adding in that user agent, we get:

wget --header="Accept: image/jpeg,image/gif" --header="User-Agent: Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36" http://image-us.samsung.com/SamsungUS/home/samsung-logo-191-1.jpg

--2020-11-13 11:52:50--  http://image-us.samsung.com/SamsungUS/home/samsung-logo-191-1.jpg
Resolving image-us.samsung.com (image-us.samsung.com)... 23.217.144.69
Connecting to image-us.samsung.com (image-us.samsung.com)|23.217.144.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 45540 (44K) [image/webp]
Saving to: ‘samsung-logo-191-1.jpg.10’

And it’s returned a image/webp. It appears as though they’re ignoring the Accept header, and using their own logic to determine the best file type based on the user agent.

Apple hasn’t supported webp historically, so let’s try that:

wget --header="Accept: image/jpeg,image/gif" --header="User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.0 Safari/605.1.15" http://image-us.samsung.com/SamsungUS/home/samsung-logo-191-1.jpg

--2020-11-13 12:27:02--  http://image-us.samsung.com/SamsungUS/home/samsung-logo-191-1.jpg
Resolving image-us.samsung.com (image-us.samsung.com)... 23.217.144.69
Connecting to image-us.samsung.com (image-us.samsung.com)|23.217.144.69|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 83094 (81K) [image/jpeg]
Saving to: ‘samsung-logo-191-1.jpg.16’

Success!

I think you can make a pretty strong argument that the webserver should always respect the Accept header and not try to outsmart it. On the other hand, someone could argue that if you present a specific user agent string, you should be prepared to act like that user agent (although user agent spoofing isn’t uncommon and has a variety of uses).

As it happens, Apple has added webp support to iOS14 (released recently) and MacOS Big Sur (released yesterday), so we probably should support that (by adding it as a default allowed extension) and by configuring ImageMagick to support it (which will introduce a dependency on libwebp or similar) so we can optimize and resize, etc.

6 Likes

That is terrible behavior on Akami’s part, yikes.

Is it trivial to change our pretend user agent? Regardless we should not be pretending to be Chrome 58 which was released in 2017.

@eviltrout should have full context on why Discourse pretends and does not use a special User Agent just for Discourse. My vague recollection is that if we do not play pretends some people deny us from downloading images and oneboxes. We could:

  1. See if stuff changed enough - and try again with our own user agent
  2. Upgrade our user agent (for pretending) to latest chrome
  3. Change to a version of Safari on MacOS that has no support for webp

Possibly, we can allow for “transparent” webp -> jpeg translation after file download, so we can still proxy non compatible images if the site operator does not fully support. (this could be handy with some other esoteric image formats)

Or we just move on and support webp by default and add it to all our defaults? (I wonder if it is too early)

1 Like

The answer is unfortunately many sites don’t serve up the correct content unless the request looks like a browser. I suspect if we change this you will find most sources work with it fine, but one or two will be odd.

Onebox is often a big game of whack a mole. Fix one bug, another pops up elsewhere.

2 Likes

Now that Apple is caving and adding support to Safari and iOS, it seems like the correct time to do this.

2 Likes