The underlying problem which caused this bug is that query parameters, at least, are not being tagged with any particular encoding. Based on some reading, it’s my understanding that modern Rails versions do correctly indicate the encoding of parameters provided in request bodies (by looking at the
Content-Type request header), and experimentation indicates that explicit query parameters (such as
?foo=bar) do get marked as being UTF-8, but as far as I can tell params which come from parsing the URL through route patterns, and format markers, are being marked as
I’ve played a bit of whack-a-mole with the problems reported in the bug topic above, but given that we found three separate instances of potentially needing to force an encoding in one place, I’m thinking the issue is fairly likely to be extremely widespread throughout the codebase.
So, the questions which come to mind are:
- Does this constitute a bug in Rails itself which should be fixed?
- How should this be worked around in Discourse, either until Rails itself gets fixed, or forever?
- How has this not been a problem before? It’s not like Discourse suddenly got its first non-seven-bit-ASCII users in the last week or two…