david
(David Taylor)
July 25, 2017, 6:46pm
1
I’m pretty sure this used to onebox nicely with a highlighted line
https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb#L227
https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb#L227
Works fine without the line number
https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb
4 Likes
sam
(Sam Saffron)
July 25, 2017, 6:50pm
2
Yeah I recall this used to work @techAPJ can you have a quick look?
david
(David Taylor)
July 25, 2017, 6:51pm
3
Aha - GitHub serves a “Canonical” URL on pages with specified line numbers.
<link rel="canonical" href="https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb" data-pjax-transient="">
So now that canonical URLs are prioritised, it's broken :(
3 Likes
sam
(Sam Saffron)
July 25, 2017, 7:13pm
4
Interesting… well we got to special case github here, its one of the only places that uses a #
to get magic behavior in onebox.
What I don’t understand though is why it is a complete fail.
6 Likes
techAPJ
(Arpit Jalan)
July 26, 2017, 8:51am
5
Wow, this was a tricky issue to debug. This is not related to “prefer canonical URL” change.
Also, this issue is not specific to GitHub only, but all the sites that uses # magic. For example:
This regression was introduced eleven days ago in this commit . Specifically in this code :
URI(URI.escape(url)) if url
Escaping the URL converts/encodes #
in link to %23
and some sites does not handle that well, GitHub and Ars Technica included.
Wikipedia handle this just well, for example:
https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters
Percent-encoding, also known as URL encoding, is a method to encode arbitrary data in a Uniform Resource Identifier (URI) using only the limited US-ASCII characters legal within a URI.
Although it is known as URL encoding, it is, in fact, used more generally within the main Uniform Resource Identifier (URI) set, which includes both Uniform Resource Locator (URL) and Uniform Resource Name (URN). As such, it is also used in the preparation of data of the application/x-www-form-urlencoded media typ...
@eviltrout should we remove the URL escaping here? Or add a new case for sites with 404
response?
7 Likes
#
should not be encoded on the right hand side of the URL; that’s not correct. I think what’s happening is we are mixing logic for URL paths and querystring stuff.
In computer hypertext, a fragment identifier is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.
The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document. The generic syntax is specified in RFC 3986. The...
Operative words beng “the optional last part of a URL ”
4 Likes
eviltrout
(Robin Ward)
July 26, 2017, 2:17pm
7
Yeah it looks to me like URI.escape
is not being smart here. I would not remove escaping altogether as it can lead to security issues, but it’s worth coming up with a failing test and fixing the escaping to work properly.
Note that #
should not be encoded as jeff points out, but <
and >
and quotes definitely should be.
5 Likes
techAPJ
(Arpit Jalan)
Split this topic
July 29, 2017, 1:04pm
8
techAPJ
(Arpit Jalan)
July 29, 2017, 2:09pm
9
4 Likes