david
(David Taylor)
July 25, 2017, 6:46pm
1
I’m pretty sure this used to onebox nicely with a highlighted line
next if SiteSetting.tagging_enabled? && i[:tags].present? && (topic_tags & i[:tags]).count == 0
next if (i[:filter] == 'mute') || (!(post.is_first_post?) && i[:filter] == 'follow')
message = slack_message(post, i[:channel])
if !(SiteSetting.slack_access_token.empty?)
response = nil
uri = ""
record = ::PluginStore.get(DiscourseSlack::PLUGIN_NAME, "topic_#{post.topic.id}_#{i[:channel]}")
if (record.present? && ((Time.now.to_i - record[:ts].split('.')[0].to_i) / 60) < 5 && record[:message][:attachments].length < 5)
attachments = record[:message][:attachments]
attachments.concat message[:attachments]
uri = URI("https://slack.com/api/chat.update" +
"?token=#{SiteSetting.slack_access_token}" +
"&username=#{CGI::escape(record[:message][:username])}" +
"&text=#{CGI::escape(record[:message][:text])}" +
"&channel=#{record[:channel]}" +
"&attachments=#{CGI::escape(attachments.to_json)}" +
"&ts=#{record[:ts]}"
https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb#L227
Works fine without the line number
module DiscourseSlack
class Slack
KEY_PREFIX = 'category_'.freeze
def self.filter_to_present(filter)
I18n.t("slack.command.present.#{filter}")
end
def self.filter_to_past(filter)
I18n.t("slack.command.past.#{filter}")
end
def self.excerpt(post, max_length = SiteSetting.slack_discourse_excerpt_length)
doc = Nokogiri::HTML.fragment(post.excerpt(max_length,
remap_emoji: true,
keep_onebox_source: true
))
SlackMessageFormatter.format(doc.to_html)
end
This file has been truncated. show original
4 Likes
sam
(Sam Saffron)
July 25, 2017, 6:50pm
2
Yeah I recall this used to work @techAPJ can you have a quick look?
david
(David Taylor)
July 25, 2017, 6:51pm
3
Aha - GitHub serves a “Canonical” URL on pages with specified line numbers.
<link rel="canonical" href="https://github.com/discourse/discourse-slack-official/blob/master/lib/discourse_slack/slack.rb" data-pjax-transient="">
So now that canonical URLs are prioritised, it's broken :(
3 Likes
sam
(Sam Saffron)
July 25, 2017, 7:13pm
4
Interesting… well we got to special case github here, its one of the only places that uses a # to get magic behavior in onebox.
What I don’t understand though is why it is a complete fail.
6 Likes
techAPJ
(Arpit Jalan)
July 26, 2017, 8:51am
5
Wow, this was a tricky issue to debug. This is not related to “prefer canonical URL” change.
Also, this issue is not specific to GitHub only, but all the sites that uses # magic. For example:
This regression was introduced eleven days ago in this commit . Specifically in this code :
URI(URI.escape(url)) if url
Escaping the URL converts/encodes # in link to %23 and some sites does not handle that well, GitHub and Ars Technica included.
Wikipedia handle this just well, for example:
https://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters
URL encoding, officially known as percent-encoding[according to whom?], is a method to encode arbitrary data in a uniform resource identifier (URI) using only the US-ASCII characters legal within a URI. Percent-encoding is used to ensure special characters do not interfere with the URI's structure and interpretation. Special characters are replaced with a percent sign (%) followed by two hexadecimal digits representing the character's byte value. For example, a space is commonly encoded as %20:
@eviltrout should we remove the URL escaping here? Or add a new case for sites with 404 response?
7 Likes
# should not be encoded on the right hand side of the URL; that’s not correct. I think what’s happening is we are mixing logic for URL paths and querystring stuff.
In computer hypertext, a fragment identifier is a string of characters that refers to a resource that is subordinate to another, primary resource. The primary resource is identified by a Uniform Resource Identifier (URI), and the fragment identifier points to the subordinate resource.
The fragment identifier introduced by a hash mark # is the optional last part of a URL for a document. It is typically used to identify a portion of that document. The generic syntax is specified in RFC 3986. The...
Operative words beng “the optional last part of a URL ”
4 Likes
eviltrout
(Robin Ward)
July 26, 2017, 2:17pm
7
Yeah it looks to me like URI.escape is not being smart here. I would not remove escaping altogether as it can lead to security issues, but it’s worth coming up with a failing test and fixing the escaping to work properly.
Note that # should not be encoded as jeff points out, but < and > and quotes definitely should be.
5 Likes
techAPJ
(Arpit Jalan)
Split this topic
July 29, 2017, 1:04pm
8
techAPJ
(Arpit Jalan)
July 29, 2017, 2:09pm
9
Fixed via:
https://github.com/discourse/discourse/commit/1fe553873caaafb0649d2c96872683d4e5007e50
Demo:
# Determine the final endpoint for a Web URI, following redirects
class FinalDestination
attr_reader :status
attr_reader :cookie
def initialize(url, opts = nil)
@uri =
begin
URI(URI.escape(CGI.unescapeHTML(url), Regexp.new("[^#{URI::PATTERN::UNRESERVED}#{URI::PATTERN::RESERVED}#]"))) if url
rescue URI::InvalidURIError
end
@opts = opts || {}
@opts[:max_redirects] ||= 5
@opts[:lookup_ip] ||= lambda do |host|
begin
IPSocket::getaddress(host)
rescue SocketError
nil
4 Likes