Onebox breaks if there's chinese text in URL


  def initialize(url, opts = nil)
    @url = url"=============================1  #{@url}")
    @uri =
        URI(escape_url) if @url
      rescue URI::InvalidURIError
      end"=============================2  #{@uri.to_s}")

Output logs:

I, [2017-08-03T15:45:31.697669 #112637]  INFO -- : =============================1  https://domain/%E6%B5%8B%E8%AF%95
I, [2017-08-03T15:45:31.757180 #112637]  INFO -- : =============================2  https://domain/%25E6%25B5%258B%25E8%25AF%2595

What exactly breaks? Do you have an exception and backtrace?

the input url is https://domain/%E6%B5%8B%E8%AF%95

the url i got in onebox turns to be: https://domain/%25E6%25B5%258B%25E8%25AF%2595

I’ve added logs in discourse/lib/final_destination.rb (as shown in the above post) and find it escape to the wrong url.

they are different which expected the same.

Well, yes, of course they’re going to be different; if you’re escaping a URL, any % characters are always going to be converted to %25, that’s what URI escaping does.

It’s no escaped by me. It’s escaped by discourse/lib/final_destination.rb.

It did not work this way before.

The scenario is:

  1. i wrote a onebox plugin, to handle url like https://domain/%E6%B5%8B%E8%AF%95

  2. user input url : https://domain/%E6%B5%8B%E8%AF%95

  3. and the onbebox got the @url=https://domain/%E6%B5%8B%E8%AF%95 and do the work

in the old days, the @url is the exact the user input url https://domain/%E6%B5%8B%E8%AF%95.

but now it turns to be https://domain/%25E6%25B5%258B%25E8%25AF%2595

The change was made in There’s no bug reference on the commit. @tgxworld, do you recall the purpose of that commit?

I noticed this with other languages.
Or is this another scenario?

Russian version

1 Like

There’s nothing language-specific here. Any URL data that can’t be represented in the constrained set of characters permitted in URLs gets percent-escaped. In both cases here, the data is UTF-8 coded character data, that’s possibly undergoing a double round of encoding. Unfortunately, because no backtrace has been provided, it’s impossible to see where the data’s coming from, merely that something is going on.

@tgxworld looks like there is a regression here:

Used to work as a Onebox in 1.8 and now no longer works.


Oops not sure how I missed the reply but I was fixing an error flooding our logs.

[1] pry(main)> URI("")
URI::InvalidURIError: bad URI(is not URI?):
from /home/tgxworld/.rbenv/versions/2.4.1/lib/ruby/2.4.0/uri/rfc3986_parser.rb:67:in `split'

Hmm I’m not sure if it actually worked in 1.8 because FinalDestination was introduced before the 1.8 release and the URL wouldn’t have been resolved at al.

[2] pry(main)> URI("Свободное_программное_обеспечение2")
URI::InvalidURIError: URI must be ascii only "\u0421\u0432\u043E\u0431\u043E\u0434\u043D\u043E\u0435_\u043F\u0440\u043E\u0433\u0440\u0430\u043C\u043C\u043D\u043E\u0435_\u043E\u0431\u0435\u0441\u043F\u0435\u0447\u0435\u043D\u0438\u04352"
from /home/tgxworld/.rbenv/versions/2.4.1/lib/ruby/2.4.0/uri/rfc3986_parser.rb:21:in `split'

Hmm OK I see what is happening here. FinalDestination is given

instead ofСвободное_программное_обеспечение

For the first case we end up escaping the %sign… Hmm will need to figure out the format given to FinalDestination because the url passed to it is sometimes not escaped.


Fixed in


Yay encoded URLs are being properly oneboxed again.