Tags does not work with Cyrillic

http://***.ru/tags/земля
500 Internal Server Error

Encoding::UndefinedConversionError ("\xD1" from ASCII-8BIT to UTF-8) /var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activesupport-4.2.9/lib/active_support/core_ext/object/json.rb:34:inencode’`

With safe_mode, similarly does not work

Ooooooh, this is going to get messy.

In theory, URLs can only contain 7-bit ASCII characters. In practice, everyone’s pretty much decided that percent-encoded UTF-8 is :ok_hand:, which is why, if you look in the “env” for the log message, you’ll see something like

REQUEST_URI /tags/%D0%B7%D0%B5%D0%BC%D0%BB%D1%8F/notifications

(Which is the UTF-8 encoding for “земля”)

So, the problem is occuring because the contents of params is being encoded as ASCII-8BIT, and while most everything manages to figure out what’s going on and roll with it, the JSON encoding of the result, containing as it does an ASCII-8BIT string whose individual bytes don’t translate into valid UTF-8 codepoints, explodes.

Luckily, the fix is relatively straightforward:

https://github.com/discourse/discourse/commit/7ee861f4571f6e7259e631d0404e0d958501dcf0

8 Likes

Thank you, greatly appreciated.

Apparently, something else is wrong. The form of the error has changed.

How did you cause that to happen? I can’t make it happen on a local instance with that tag created. The JSON always renders for me.

I’m just trying to click on any tag: http://toxu.ru

http://toxu.ru/tags/земля
http://toxu.ru/tags/faq - works

http://toxu.ru/tags/земля/l/latest.json?order=default&ascending=false&filter=tags/земля/l/latest

http://toxu.ru/tags/faq/l/latest.json?order=default&ascending=false&filter=tags/faq/l/latest - works

It must be the additional complexity of the data you’ve got that’s causing the bug to appear, whilst my trivial example DB doesn’t trip it up.

Since I can’t reproduce it myself, I’ve had to take a bit of a speculative bug fix attempt in https://github.com/discourse/discourse/commit/67882ec37da6dac2ec0ce69e110014a6fe11882c; please let me know if it does/doesn’t work for you. I think we’re going to have to do a larger and more comprehensive fix for this, along the lines of this initializer; playing whack-a-mole with parameter encodings one-by-one seems like a good way to go prematurely bald.

2 Likes

Now it works! Thank you!

Generic discussion of parsed parameter encoding is over here.