Onebox breaks if there's chinese text in URL

kiki · August 3, 2017, 7:48am

discourse/lib/final_destination.rb

  def initialize(url, opts = nil)
    @url = url
    Rails.logger.info("=============================1  #{@url}")
    @uri =
      begin
        URI(escape_url) if @url
      rescue URI::InvalidURIError
      end
    Rails.logger.info("=============================2  #{@uri.to_s}")

Output logs:

I, [2017-08-03T15:45:31.697669 #112637]  INFO -- : =============================1  https://domain/%E6%B5%8B%E8%AF%95
I, [2017-08-03T15:45:31.757180 #112637]  INFO -- : =============================2  https://domain/%25E6%25B5%258B%25E8%25AF%2595

mpalmer · August 3, 2017, 7:51am

What exactly breaks? Do you have an exception and backtrace?

kiki · August 3, 2017, 7:52am

the input url is https://domain/%E6%B5%8B%E8%AF%95

the url i got in onebox turns to be: https://domain/%25E6%25B5%258B%25E8%25AF%2595

I’ve added logs in discourse/lib/final_destination.rb (as shown in the above post) and find it escape to the wrong url.

they are different which expected the same.

mpalmer · August 3, 2017, 7:57am

Well, yes, of course they’re going to be different; if you’re escaping a URL, any % characters are always going to be converted to %25, that’s what URI escaping does.

kiki · August 3, 2017, 8:19am

It’s no escaped by me. It’s escaped by discourse/lib/final_destination.rb.

It did not work this way before.

The scenario is:

i wrote a onebox plugin, to handle url like https://domain/%E6%B5%8B%E8%AF%95
user input url : https://domain/%E6%B5%8B%E8%AF%95
and the onbebox got the @url=https://domain/%E6%B5%8B%E8%AF%95 and do the work

in the old days, the @url is the exact the user input url https://domain/%E6%B5%8B%E8%AF%95.

but now it turns to be https://domain/%25E6%25B5%258B%25E8%25AF%2595

mpalmer · August 3, 2017, 8:23am

The change was made in https://github.com/discourse/discourse/commit/b534778f46ac310d9b59afa6f5390fced267f2f0. There’s no bug reference on the commit. @tgxworld, do you recall the purpose of that commit?

Stranik · August 3, 2017, 8:24am

I noticed this with other languages.
Or is this another scenario?

https://en.wikipedia.org/wiki/Free_software

Russian version

https://ru.wikipedia.org/wiki/%D0%A1%D0%B2%D0%BE%D0%B1%D0%BE%D0%B4%D0%BD%D0%BE%D0%B5_%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%BD%D0%BE%D0%B5_%D0%BE%D0%B1%D0%B5%D1%81%D0%BF%D0%B5%D1%87%D0%B5%D0%BD%D0%B8%D0%B5

ru.wikipedia.org

Свободное программное обеспечение

Свободное программное обеспе́чение (СПО, англ. free software, также software libre или libre software), свободный софт — программное обеспечение, пользователи которого имеют права («свободы») на его неограниченную установку, запуск, свободное использование, изучение, распространение и изменение (совершенствование), а также распространение копий и результатов изменения. Если на программное обеспечение есть исключительные права, то свободы объявляются при помощи свободных лицензий. Как и бесплатно...

mpalmer · August 3, 2017, 8:29am

There’s nothing language-specific here. Any URL data that can’t be represented in the constrained set of characters permitted in URLs gets percent-escaped. In both cases here, the data is UTF-8 coded character data, that’s possibly undergoing a double round of encoding. Unfortunately, because no backtrace has been provided, it’s impossible to see where the data’s coming from, merely that something is going on.

sam · August 11, 2017, 5:16pm

@tgxworld looks like there is a regression here:

ru.wikipedia.org

Свободное программное обеспечение

Свободное программное обеспе́чение (СПО, англ. free software, также software libre или libre software), свободный софт — программное обеспечение, пользователи которого имеют права («свободы») на его неограниченную установку, запуск, свободное использование, изучение, распространение и изменение (совершенствование), а также распространение копий и результатов изменения. Если на программное обеспечение есть исключительные права, то свободы объявляются при помощи свободных лицензий. Как и бесплатно...

Used to work as a Onebox in 1.8 and now no longer works.

tgxworld · August 22, 2017, 8:06am

Oops not sure how I missed the reply but I was fixing an error flooding our logs.

[1] pry(main)> URI("https://eviltrout.com?s=180&#038;d=mm&#038;r=g")
URI::InvalidURIError: bad URI(is not URI?): https://eviltrout.com?s=180&#038;d=mm&#038;r=g
from /home/tgxworld/.rbenv/versions/2.4.1/lib/ruby/2.4.0/uri/rfc3986_parser.rb:67:in `split'

tgxworld · August 22, 2017, 8:15am

Hmm I’m not sure if it actually worked in 1.8 because FinalDestination was introduced before the 1.8 release and the URL wouldn’t have been resolved at al.

[2] pry(main)> URI("https://ru.wikipedia.org/wiki/Свободное_программное_обеспечение2")
URI::InvalidURIError: URI must be ascii only "https://ru.wikipedia.org/wiki/\u0421\u0432\u043E\u0431\u043E\u0434\u043D\u043E\u0435_\u043F\u0440\u043E\u0433\u0440\u0430\u043C\u043C\u043D\u043E\u0435_\u043E\u0431\u0435\u0441\u043F\u0435\u0447\u0435\u043D\u0438\u04352"
from /home/tgxworld/.rbenv/versions/2.4.1/lib/ruby/2.4.0/uri/rfc3986_parser.rb:21:in `split'

tgxworld · August 22, 2017, 8:34am

Hmm OK I see what is happening here. FinalDestination is given

https://ru.wikipedia.org/wiki/%D0%A1%D0%B2%D0%BE%D0%B1%D0%BE%D0%B4%D0%BD%D0%BE%D0%B5_%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%BD%D0%BE%D0%B5_%D0%BE%D0%B1%D0%B5%D1%81%D0%BF%D0%B5%D1%87%D0%B5%D0%BD%D0%B8%D0%B5

instead of

https://ru.wikipedia.org/wiki/Свободное_программное_обеспечение

For the first case we end up escaping the %sign… Hmm will need to figure out the format given to FinalDestination because the url passed to it is sometimes not escaped.

tgxworld · September 26, 2017, 10:36am

Fixed in

https://github.com/discourse/discourse/commit/367fb1c524cff06a33c7a4144cd13a270a9f3489

tgxworld · September 27, 2017, 2:58am

Yay encoded URLs are being properly oneboxed again.

ru.wikipedia.org

Свободное программное обеспечение

Свободное программное обеспе́чение (СПО, англ. free software, также software libre или libre software), свободный софт — программное обеспечение, пользователи которого имеют права («свободы») на его неограниченную установку, запуск, свободное использование, изучение, распространение и изменение (совершенствование), а также распространение копий и результатов изменения. Если на программное обеспечение есть исключительные права, то свободы объявляются при помощи свободных лицензий. Как и бесплатно...

ru.wikipedia.org

Свободное программное обеспечение

Свободное программное обеспе́чение (СПО, англ. free software, также software libre или libre software), свободный софт — программное обеспечение, пользователи которого имеют права («свободы») на его неограниченную установку, запуск, свободное использование, изучение, распространение и изменение (совершенствование), а также распространение копий и результатов изменения. Если на программное обеспечение есть исключительные права, то свободы объявляются при помощи свободных лицензий. Как и бесплатно...

Topic		Replies	Views
Oneboxing changes link URL? Support	6	897	July 24, 2023
Oneboxing of sites with hash (#) in URL not working Bug	9	1868	July 29, 2017
One Box url encoding of already encoded comma Bug	4	1095	June 27, 2019
Oneboxing PDF: rewrite URL escaped characters to their original value Feature	0	372	October 31, 2021
Onebox problem with i18n urls Feature	8	910	September 4, 2019

Onebox breaks if there's chinese text in URL

Related topics