Tags does not work with Cyrillic

Ooooooh, this is going to get messy.

In theory, URLs can only contain 7-bit ASCII characters. In practice, everyone’s pretty much decided that percent-encoded UTF-8 is :ok_hand:, which is why, if you look in the “env” for the log message, you’ll see something like

REQUEST_URI /tags/%D0%B7%D0%B5%D0%BC%D0%BB%D1%8F/notifications

(Which is the UTF-8 encoding for “земля”)

So, the problem is occuring because the contents of params is being encoded as ASCII-8BIT, and while most everything manages to figure out what’s going on and roll with it, the JSON encoding of the result, containing as it does an ASCII-8BIT string whose individual bytes don’t translate into valid UTF-8 codepoints, explodes.

Luckily, the fix is relatively straightforward:

https://github.com/discourse/discourse/commit/7ee861f4571f6e7259e631d0404e0d958501dcf0

8 Likes