Oneboxing of sites with hash (#) in URL not working


(David Taylor) #1

I’m pretty sure this used to onebox nicely with a highlighted line

https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb#L227

Works fine without the line number


Can't onebox Overwatch Forums topic
(Sam Saffron) #2

Yeah I recall this used to work @techAPJ can you have a quick look?


(David Taylor) #3

Aha - GitHub serves a “Canonical” URL on pages with specified line numbers.

<link rel="canonical" href="https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb" data-pjax-transient="">
So now that canonical URLs are prioritised, it's broken :(

(Sam Saffron) #4

Interesting… well we got to special case github here, its one of the only places that uses a # to get magic behavior in onebox.

What I don’t understand though is why it is a complete fail.


(Arpit Jalan) #5

Wow, this was a tricky issue to debug. This is not related to “prefer canonical URL” change.

Also, this issue is not specific to GitHub only, but all the sites that uses # magic. For example:

This regression was introduced eleven days ago in this commit. Specifically in this code:

URI(URI.escape(url)) if url

Escaping the URL converts/encodes # in link to %23 and some sites does not handle that well, GitHub and Ars Technica included.

Wikipedia handle this just well, for example:

https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters

@eviltrout should we remove the URL escaping here? Or add a new case for sites with 404 response?


(Jeff Atwood) #6

# should not be encoded on the right hand side of the URL; that’s not correct. I think what’s happening is we are mixing logic for URL paths and querystring stuff.

Operative words beng “the optional last part of a URL


(Robin Ward) #7

Yeah it looks to me like URI.escape is not being smart here. I would not remove escaping altogether as it can lead to security issues, but it’s worth coming up with a failing test and fixing the escaping to work properly.

Note that # should not be encoded as jeff points out, but < and > and quotes definitely should be.


(Arpit Jalan) #8

A post was split to a new topic: Can’t onebox Overwatch Forums topic


(Arpit Jalan) #9

Fixed via:


Demo:


(Arpit Jalan) #10