Oneboxing of sites with hash (#) in URL not working

I’m pretty sure this used to onebox nicely with a highlighted line

https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb#L227

https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb#L227

Works fine without the line number

https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb

4 Likes

Yeah I recall this used to work @techAPJ can you have a quick look?

Aha - GitHub serves a “Canonical” URL on pages with specified line numbers.

<link rel="canonical" href="https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb" data-pjax-transient="">
So now that canonical URLs are prioritised, it's broken :(
3 Likes

Interesting… well we got to special case github here, its one of the only places that uses a # to get magic behavior in onebox.

What I don’t understand though is why it is a complete fail.

6 Likes

Wow, this was a tricky issue to debug. This is not related to “prefer canonical URL” change.

Also, this issue is not specific to GitHub only, but all the sites that uses # magic. For example:

This regression was introduced eleven days ago in this commit. Specifically in this code:

URI(URI.escape(url)) if url

Escaping the URL converts/encodes # in link to %23 and some sites does not handle that well, GitHub and Ars Technica included.

Wikipedia handle this just well, for example:

https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters

@eviltrout should we remove the URL escaping here? Or add a new case for sites with 404 response?

7 Likes

# should not be encoded on the right hand side of the URL; that’s not correct. I think what’s happening is we are mixing logic for URL paths and querystring stuff.

Operative words beng “the optional last part of a URL

4 Likes

Yeah it looks to me like URI.escape is not being smart here. I would not remove escaping altogether as it can lead to security issues, but it’s worth coming up with a failing test and fixing the escaping to work properly.

Note that # should not be encoded as jeff points out, but < and > and quotes definitely should be.

5 Likes

A post was split to a new topic: Can’t onebox Overwatch Forums topic

Fixed via:

https://github.com/discourse/discourse/commit/1fe553873caaafb0649d2c96872683d4e5007e50


Demo:

https://github.com/discourse/discourse/blob/1fe553873caaafb0649d2c96872683d4e5007e50/lib/final_destination.rb#L15

4 Likes