مشكلة في الـ umlauts عند تضمين Discourse على موقع آخر

limetti · 10 ديسمبر 2023، 9:41م

كما هو موضح هنا (Embed Discourse comments on another website via Javascript - #453 by limetti)، عند تضمين Discourse في موقعي الإلكتروني، يتم تحليل العنوان بشكل صحيح. ولكن نظرًا لأنه يحتوي على علامات ترقيم علوية، فإن العناوين مثل “Ich würde” تنتهي بـ “Ich wÃ¼rde”.

هل هذه مشكلة عامة، مشكلة في صفحتي، أو أي حل بديل لذلك؟ شكرًا!

supermathie · 11 ديسمبر 2023، 5:08ص

هذه مشكلة كلاسيكية “ترميز خاطئ”.

كحالة اختبار، إذا قرأنا (عبر بايثون، في هذا المثال) البيانات الأولية من مشاركتك:

In [1]: import urllib

In [2]: u = urllib.request.urlopen('https://meta.discourse.org/posts/1418409/raw')

In [3]: r = u.read(); r
Out[3]: b'As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like \\xe2\\x80\\x9cIch w\\xc3\\xbcrde\\xe2\\x80\\x9d end up in \\xe2\\x80\\x9cIch w\\xc3\\x83\\xc2\\xbcrde\\xe2\\x80\\x9d.\\n\\nIs this a general problem, a problem with my page or any workaround for that? Thanks!'

نحصل على بايتات، لكننا لا نعرف كيفية فك ترميزها. ومع ذلك، يخبرنا أحد رؤوس الاستجابة أنه يجب علينا استخدام UTF-8:

In [4]: u.headers['content-type']
Out[4]: 'text/plain; charset=utf-8'

In [5]: r.decode('utf-8')
Out[5]: 'As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like “Ich würde” end up in “Ich wÃ¼rde”.\\n\\nIs this a general problem, a problem with my page or any workaround for that? Thanks!'

In [6]: print(r.decode('utf-8'))
As described here (https://meta.discourse.org/t/embed-discourse-comments-on-another-website-via-javascript/31963/453), when embedding Discourse into my website, the title is correctly parsed. But as it contains umlauts, titles like “Ich würde” end up in “Ich wÃ¼rde”.

Is this a general problem, a problem with my page or any workaround for that? Thanks!

ستلاحظ أن الأحرف تبدو تمامًا كما نشرتها. ولكن عندما يتم تفسير هذه البايتات بشكل خاطئ - خاصةً عندما يتم ارتكاب الخطأ الشائع بتفسير هذه البايتات على أنها ISO-8859-1 بدلاً من UTF-8 (تم تقصير السلسلة للتوضيح أدناه)، تحصل على:

In [7]: snippet = r[220:255]; snippet
Out[7]: b'titles like \\xe2\\x80\\x9cIch w\\xc3\\xbcrde\\xe2\\x80\\x9d end up'

In [8]: snippet.decode('utf-8')
Out[8]: 'titles like “Ich würde” end up'

In [9]: snippet.decode('iso-8859-1')
Out[9]: 'titles like â\x80\x9cIch wÃ¼rdeâ\x80\x9d end up'

إذا قمت بطباعة ذلك، فإن الطرفية الخاصة بي تعلق. غريب.

لتلخيص: أيًا كان ما تستخدمه لسحب بيانات المشاركة من Discourse يعاملها على أنها iso-8859-1 بدلاً من utf-8.

(تكهنات) ربما تقوم بتضمين البايتات الأولية المسحوبة من موقع Discourse في صفحة يتم تقديمها برمز صفحة iso-8859-1.

limetti · 13 ديسمبر 2023، 7:53ص

شكراً جزيلاً على التلميح. بالفعل، كانت الوسم meta-tag UTF-8 بعد الوسم title-tag

يعمل الآن!

الموضوع		الردود	مرات العرض
Apostrophies not working in embed WordPress	7	947	3 يناير 2018
Comment Embed Isn't Working - How to troubleshoot? Support	2	1125	2 يوليو 2015
URLs with encoding are altered when using Discourse link function Support	7	87	15 أبريل 2026
Reply-by-mail UTF-8 characters mis-rendering Support	0	320	30 يونيو 2021
Weird encoding issue on categories page Support unsupported-install	15	399	5 فبراير 2025

مشكلة في الـ umlauts عند تضمين Discourse على موقع آخر

الموضوعات ذات الصلة