I was just about to post a new topic but I assume this is the same or a related bug due to the #
.
Onebox removes everything after /en/
from the URL.
Just an extra case for consideration I suppose, unless it’s completely unrelated.
I was just about to post a new topic but I assume this is the same or a related bug due to the #
.
Onebox removes everything after /en/
from the URL.
Just an extra case for consideration I suppose, unless it’s completely unrelated.
This issue is not related to #
in URL. Added on my list to look into.
I looked into this issue and this is because Overwatch Forum is returning different headers when making HEAD request (instead of GET).
➜ ~ curl -I https://us.battle.net/forums/en/overwatch/topic/20758326009
HTTP/1.1 302 Found
Date: Fri, 04 Aug 2017 13:57:32 GMT
Server: Apache
X-Frame-Options: SAMEORIGIN
Retry-After: 600
Set-Cookie: login.cookies=1; Domain=battle.net; Path=/
Location: https://us.battle.net/forums/
Content-Language: en-US
➜ ~ curl -I https://us.battle.net/forums/
HTTP/1.1 302 Found
Date: Fri, 04 Aug 2017 13:57:47 GMT
Server: Apache
X-Frame-Options: SAMEORIGIN
Retry-After: 600
Set-Cookie: login.cookies=1; Domain=battle.net; Path=/
Location: https://us.battle.net/forums/en/
➜ ~ curl -I https://us.battle.net/forums/en/
HTTP/1.1 200 OK
Date: Fri, 04 Aug 2017 13:58:08 GMT
Server: Apache
X-Frame-Options: SAMEORIGIN
Retry-After: 600
Set-Cookie: login.cookies=1; Domain=battle.net; Path=/
Content-Language: en-US
Content-Length: 100225
Content-Type: application/xhtml+xml;charset=UTF-8
Since Onebox is making HEAD request to resolve the URL before actually oneboxing it, the final URL is different than provided.
Changing FinalDestination
library to make GET request (instead of HEAD) fixes this issue, but I am not sure if that is the correct approach here (since we don’t need the response body in FinalDestination
library). What do you think @eviltrout?
This actually seems like a bug of Blizzard’s to me. The HEAD
request is redirecting without remembering what the initial URL was supposed to be.
I have seen other sites fail with a HEAD request, but usually they were small sites and not as big as Blizzard.
I don’t think we should stop using HEAD
since it’s much more efficient, but perhaps we could add a blacklist of sites to use GET
instead of HEAD
?
Blacklist is probably the way to go. Like @eviltrout, I’ve seen sites block HEAD, too. I think some naive site owners think HEAD is just for useless bots.
If HEAD
is blocked it will try GET
afterwards. The only problem here is if the site returns a HEAD
response which is wrong
Should be fixed by:
https://github.com/discourse/discourse/commit/6cd8203686ac7c2584523394944fdbe905a5a21a
Also I fixed images not showing in (will work at next onebox bump):
https://github.com/discourse/onebox/commit/2c8f4c2d3e235cb26f344cb2d5a9e49ec824d1eb