Non-UTF accented characters in URL problems

We’re migrating a Spanish forum and setting up permalinks for all topics.
The existing titles and URL’s have accented characters in ISO 8859-15 in them, escaped with percent encoding

forum/showthread.php?96700-Galer%EDa-de-im%E1genes

We’re rewriting them with /forum\/showthread.php\?(\d*).*/thread/\1 but unfortunately we get a server error (with a white page) before the permalink normalization is processed.

You can see this on for instance Try:

https://try.discourse.org/forum/showthread.php?96700-Galer�a-de-im�genes gives a blank page and “bad request”. Rack::QueryParser::InvalidParameterError (invalid byte sequence in UTF-8)

https://try.discourse.org/forum/showthread.php?96700-Galeria-de-imagenes gives the regular “not found” page.

Do you have any tips on how to get around this without too much bespoke nginx tweaking?

3 Likes

Maybe decoding 96700-Galer%EDa-de-im%E1genes as ISO 8859-15 and encoding it as UTF-8 before generating the permalink is a solution? Or are there actually incoming links out there that use the ISO 8859-15 encoded characters in the URL?

Mind, I’m not sure if it’s even necessary to use percent encoding for permalinks to work – I haven’t looked at the code. I guess you’ll need to experiment a little bit. I don’t think there’s an easy solution for this. If you find a solution, please post it here. It might be helpful for others.

1 Like

Yes, there are about 150,000 external links that we have no control over.