This is a classic “wrong codec” problem.
As a test case, if we read (via python, in this example) the raw data from your post:
In [1]: import urllib
In [2]: u = urllib.request.urlopen('https://meta.discourse.org/posts/1418409/raw')
In [3]: r = u.read(); r
Out[3]: b'As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like \xe2\x80\x9cIch w\xc3\xbcrde\xe2\x80\x9d end up in \xe2\x80\x9cIch w\xc3\x83\xc2\xbcrde\xe2\x80\x9d.\n\nIs this a general problem, a problem with my page or any workaround for that? Thanks!'
We get bytes, but don’t know how to decode that. However, one of the response headers tells us we should use UTF-8:
In [4]: u.headers['content-type']
Out[4]: 'text/plain; charset=utf-8'
In [5]: r.decode('utf-8')
Out[5]: 'As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like “Ich würde” end up in “Ich würde”.\n\nIs this a general problem, a problem with my page or any workaround for that? Thanks!'
In [6]: print(r.decode('utf-8'))
As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like “Ich würde” end up in “Ich würde”.
Is this a general problem, a problem with my page or any workaround for that? Thanks!
You’ll note the characters look exactly as you posted. But when the wrong interpretation is made of those bytes — especially when the common mistake of interpreting these bytes as ISO-8859-1 instead of UTF-8 (string shortened for clarity below) is made, you get:
In [7]: snippet = r[220:255]; snippet
Out[7]: b'titles like \xe2\x80\x9cIch w\xc3\xbcrde\xe2\x80\x9d end up'
In [8]: snippet.decode('utf-8')
Out[8]: 'titles like “Ich würde” end up'
In [9]: snippet.decode('iso-8859-1')
Out[9]: 'titles like â\x80\x9cIch würdeâ\x80\x9d end up'
If I print
that, my terminal hangs. Wild.
To sum up: whatever you’re using to pull the post data out of Discourse is treating it as iso-8859-1 instead of utf-8.
(speculating) Perhaps you’re embedding the raw bytes pulled from a Discourse site into a page that is being served with a codepage of iso-8859-1.