将 Discourse 嵌入其他网站时，出现 umlaut 问题

limetti · 2023 年12 月 10 日 21:41

如此处 (Embed Discourse comments on another website via Javascript - #453 by limetti) 所述，在将 Discourse 嵌入我的网站时，标题解析正确。但由于标题包含 umlauts（变音符号），像“Ich würde”这样的标题最终变成了“Ich wÃ¼rde”。

这是普遍存在的问题，还是我页面上的问题，或者有什么解决方法吗？谢谢！

supermathie · 2023 年12 月 11 日 05:08

这是一个经典的“编码错误”问题。

作为测试用例，如果我们通过 Python（在此示例中）读取帖子中的原始数据：

In [1]: import urllib

In [2]: u = urllib.request.urlopen('https://meta.discourse.org/posts/1418409/raw')

In [3]: r = u.read(); r
Out[3]: b'As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like \\xe2\\x80\\x9cIch w\\xc3\\xbcrde\\xe2\\x80\\x9d end up in \\xe2\\x80\\x9cIch w\\xc3\\x83\\xc2\\xbcrde\\xe2\\x80\\x9d.\\n\\nIs this a general problem, a problem with my page or any workaround for that? Thanks!'

我们得到的是字节，但不知道如何解码。然而，响应头之一告诉我们应该使用 UTF-8：

In [4]: u.headers['content-type']
Out[4]: 'text/plain; charset=utf-8'

In [5]: r.decode('utf-8')
Out[5]: 'As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like “Ich würde” end up in “Ich wÃ¼rde”.\\n\\nIs this a general problem, a problem with my page or any workaround for that? Thanks!'

In [6]: print(r.decode('utf-8'))
As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like “Ich würde” end up in “Ich wÃ¼rde”.

Is this a general problem, a problem with my page or any workaround for that? Thanks!

你会注意到字符与你发布时完全一样。但是，当对这些字节做出错误的解释时——尤其是当常见的错误是将这些字节解释为 ISO-8859-1 而不是 UTF-8（为清晰起见，字符串已缩短）时，你会得到：

In [7]: snippet = r[220:255]; snippet
Out[7]: b'titles like \\xe2\\x80\\x9cIch w\\xc3\\xbcrde\\xe2\\x80\\x9d end up'

In [8]: snippet.decode('utf-8')
Out[8]: 'titles like “Ich würde” end up'

In [9]: snippet.decode('iso-8859-1')
Out[9]: 'titles like â\x80\x9cIch wÃ¼rdeâ\x80\x9d end up'

如果我 print 那个，我的终端就会挂起。太神奇了。

总而言之：你用来从 Discourse 中提取帖子数据的任何东西都将其视为 iso-8859-1 而不是 utf-8。

（推测）也许你正在将从 Discourse 站点提取的原始字节嵌入到一个以 iso-8859-1 代码页提供的页面中。

limetti · 2023 年12 月 13 日 07:53

非常感谢你的提示。确实，UTF-8 的 meta 标签在 title 标签之后

现在可以正常工作了！

system · 2024 年1 月 12 日 07:53

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

话题		回复	浏览量
Apostrophies not working in embed WordPress	7	913	2018 年1 月 3 日
Comment Embed Isn't Working - How to troubleshoot? Support	3	1109	2015 年7 月 2 日
Reply-by-mail UTF-8 characters mis-rendering Support	0	311	2021 年6 月 30 日
Weird encoding issue on categories page Support unsupported-install	15	187	2025 年2 月 5 日
Issue with renaming user with unicode characters Support	13	919	2022 年12 月 25 日

将 Discourse 嵌入其他网站时，出现 umlaut 问题

相关话题