Tags does not work with Cyrillic

(Evgeny) #1

500 Internal Server Error

Encoding::UndefinedConversionError ("\xD1" from ASCII-8BIT to UTF-8) /var/www/discourse/vendor/bundle/ruby/2.3.0/gems/activesupport-4.2.9/lib/active_support/core_ext/object/json.rb:34:inencode’`

With safe_mode, similarly does not work

Automatic encoding of parsed URL params
(Matt Palmer) #2

Ooooooh, this is going to get messy.

In theory, URLs can only contain 7-bit ASCII characters. In practice, everyone’s pretty much decided that percent-encoded UTF-8 is :ok_hand:, which is why, if you look in the “env” for the log message, you’ll see something like

REQUEST_URI /tags/%D0%B7%D0%B5%D0%BC%D0%BB%D1%8F/notifications

(Which is the UTF-8 encoding for “земля”)

So, the problem is occuring because the contents of params is being encoded as ASCII-8BIT, and while most everything manages to figure out what’s going on and roll with it, the JSON encoding of the result, containing as it does an ASCII-8BIT string whose individual bytes don’t translate into valid UTF-8 codepoints, explodes.

Luckily, the fix is relatively straightforward:

When slug generation method is encoded, It redirected many times when we open new tab opens in topic
(Evgeny) #3

Thank you, greatly appreciated.

(Jeff Atwood) #4

(Matt Palmer) #5

(Evgeny) #6

Apparently, something else is wrong. The form of the error has changed.

(Matt Palmer) #7

How did you cause that to happen? I can’t make it happen on a local instance with that tag created. The JSON always renders for me.

(Evgeny) #8

I’m just trying to click on any tag: http://toxu.ru

http://toxu.ru/tags/faq - works


http://toxu.ru/tags/faq/l/latest.json?order=default&ascending=false&filter=tags/faq/l/latest - works

(Matt Palmer) #9

It must be the additional complexity of the data you’ve got that’s causing the bug to appear, whilst my trivial example DB doesn’t trip it up.

Since I can’t reproduce it myself, I’ve had to take a bit of a speculative bug fix attempt in https://github.com/discourse/discourse/commit/67882ec37da6dac2ec0ce69e110014a6fe11882c; please let me know if it does/doesn’t work for you. I think we’re going to have to do a larger and more comprehensive fix for this, along the lines of this initializer; playing whack-a-mole with parameter encodings one-by-one seems like a good way to go prematurely bald.

(Evgeny) #10

Now it works! Thank you!

(Matt Palmer) #11

Generic discussion of parsed parameter encoding is over here.