Problema con gli umlaut nell'incorporare Discourse su un altro sito web

limetti · 10 Dicembre 2023, 9:41pm

Come descritto qui (Embed Discourse comments on another website via Javascript - #453 by limetti), quando si incorpora Discourse nel mio sito web, il titolo viene analizzato correttamente. Ma poiché contiene umlaut, titoli come “Ich würde” finiscono in “Ich wÃ¼rde”.

È un problema generale, un problema della mia pagina o esiste una soluzione alternativa? Grazie!

supermathie · 11 Dicembre 2023, 5:08am

Questo è un classico problema di “codec errato”.

Come caso di test, se leggiamo (tramite python, in questo esempio) i dati grezzi dal tuo post:

In [1]: import urllib

In [2]: u = urllib.request.urlopen('https://meta.discourse.org/posts/1418409/raw')

In [3]: r = u.read(); r
Out[3]: b'As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like \\xe2\\x80\\x9cIch w\\xc3\\xbcrde\\xe2\\x80\\x9d end up in \\xe2\\x80\\x9cIch w\\xc3\\x83\\xc2\\xbcrde\\xe2\\x80\\x9d.\\n\\nIs this a general problem, a problem with my page or any workaround for that? Thanks!'

Otteniamo byte, ma non sappiamo come decodificarli. Tuttavia, uno degli header di risposta ci dice che dovremmo usare UTF-8:

In [4]: u.headers['content-type']
Out[4]: 'text/plain; charset=utf-8'

In [5]: r.decode('utf-8')
Out[5]: 'As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like “Ich würde” end up in “Ich wÃ¼rde”.\\n\\nIs this a general problem, a problem with my page or any workaround for that? Thanks!'

In [6]: print(r.decode('utf-8'))
As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like “Ich würde” end up in “Ich wÃ¼rde”.

Is this a general problem, a problem with my page or any workaround for that? Thanks!

Noterai che i caratteri appaiono esattamente come li hai postati. Ma quando viene fatta un’interpretazione errata di questi byte — specialmente quando si commette l’errore comune di interpretare questi byte come ISO-8859-1 invece di UTF-8 (stringa abbreviata per chiarezza di seguito), ottieni:

In [7]: snippet = r[220:255]; snippet
Out[7]: b'titles like \\xe2\\x80\\x9cIch w\\xc3\\xbcrde\\xe2\\x80\\x9d end up'

In [8]: snippet.decode('utf-8')
Out[8]: 'titles like “Ich würde” end up'

In [9]: snippet.decode('iso-8859-1')
Out[9]: 'titles like â\x80\x9cIch wÃ¼rdeâ\x80\x9d end up'

Se faccio print di questo, il mio terminale si blocca. Incredibile.

Per riassumere: qualunque cosa tu stia usando per estrarre i dati del post da Discourse lo sta trattando come iso-8859-1 invece di utf-8.

(ipotizzando) Forse stai incorporando i byte grezzi estratti da un sito Discourse in una pagina che viene servita con una codepage iso-8859-1.

limetti · 13 Dicembre 2023, 7:53am

Grazie mille per l’indizio. In effetti, il meta-tag UTF-8 era dopo il tag title

Ora funziona!

system · 12 Gennaio 2024, 7:53am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Argomento		Risposte	Visualizzazioni
Apostrophies not working in embed WordPress	7	913	Gennaio 3, 2018
Comment Embed Isn't Working - How to troubleshoot? Support	3	1109	Luglio 2, 2015
Reply-by-mail UTF-8 characters mis-rendering Support	0	311	Giugno 30, 2021
Weird encoding issue on categories page Support unsupported-install	15	189	Febbraio 5, 2025
Issue with renaming user with unicode characters Support	13	919	Dicembre 25, 2022

Problema con gli umlaut nell'incorporare Discourse su un altro sito web

Argomenti correlati