jtsagata | 2019-06-01 00:42:38 UTC | #1 Onebox have problem with some links from a wordpress site. The problem seems to have something with not latin characters in the url. 2 links to demonstrate the problem is [this](https://cerebrux.net/2019/05/28/brave-browser-%ce%bf-%ce%b4%ce%af%ce%ba%ce%b1%ce%b9%ce%bf%cf%82-browser-%cf%80%ce%bf%cf%85-%cf%80%cf%81%ce%bf%cf%83%cf%86%ce%ad%cf%81%ce%b5%ce%b9-%cf%84%ce%b1%cf%87%cf%8d%cf%84%ce%b7%cf%84%ce%b1/) and [this](https://cerebrux.net/2019/06/01/google-chrome-%ce%b4%ce%b5%ce%bd-%ce%b8%ce%b1-%ce%bc%cf%80%ce%bf%cf%81%ce%bf%cf%8d%ce%bc%ce%b5-%ce%bd%ce%b1-%ce%ad%cf%87%ce%bf%cf%85%ce%bc%ce%b5-adblocks/). I can follow the links in the inline version, but not from the onebox version. Same if i use a url shorten service. The same links onebox version https://cerebrux.net/2019/05/28/brave-browser-%ce%bf-%ce%b4%ce%af%ce%ba%ce%b1%ce%b9%ce%bf%cf%82-browser-%cf%80%ce%bf%cf%85-%cf%80%cf%81%ce%bf%cf%83%cf%86%ce%ad%cf%81%ce%b5%ce%b9-%cf%84%ce%b1%cf%87%cf%8d%cf%84%ce%b7%cf%84%ce%b1/ https://cerebrux.net/2019/06/01/google-chrome-%ce%b4%ce%b5%ce%bd-%ce%b8%ce%b1-%ce%bc%cf%80%ce%bf%cf%81%ce%bf%cf%8d%ce%bc%ce%b5-%ce%bd%ce%b1-%ce%ad%cf%87%ce%bf%cf%85%ce%bc%ce%b5-adblocks/ ------------------------- Falco | 2019-06-01 00:42:24 UTC | #2 Those aren't valid URLs. They are https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier which don't get the same treatment. ------------------------- jtsagata | 2019-06-01 07:08:22 UTC | #3 Valid or invalid many of the local sites use this form and that is outside our control. For example all of the Greek wikipedia https://el.wikipedia.org/wiki/%CE%91%CF%81%CF%87%CE%B1%CE%AF%CE%B1_%CE%A1%CF%8E%CE%BC%CE%B7 And as everything seems to work, and the onebox is created it was not detectable by the end user. Why not just pass the link as given, or the short url as given; Is there any way to detect and don't make onebox it such cases, or to disable onebox completly? ------------------------- codinghorror | 2019-06-01 10:57:04 UTC | #4 [quote="jtsagata, post:3, topic:119254"] Is there any way to detect and don’t make onebox it such cases, or to disable onebox completly? [/quote] Yes, trivially, just enter a space (or any other character or text) before the link. You can also blacklist the domain from oneboxing in your site settings. ------------------------- lionel-rowe | 2019-06-01 13:45:44 UTC | #5 [quote="Falco, post:2, topic:119254"] Those aren’t valid URLs. They are [Internationalized Resource Identifier - Wikipedia ](https://en.wikipedia.org/wiki/Internationalized_Resource_Identifier) which don’t get the same treatment. [/quote] It doesn't look like it. [Raw](https://meta.discourse.org/raw/119254): ``` https://cerebrux.net/2019/05/28/brave-browser-%ce%bf-%ce%b4%ce%af%ce%ba%ce%b1%ce%b9%ce%bf%cf%82-browser-%cf%80%ce%bf%cf%85-%cf%80%cf%81%ce%bf%cf%83%cf%86%ce%ad%cf%81%ce%b5%ce%b9-%cf%84%ce%b1%cf%87%cf%8d%cf%84%ce%b7%cf%84%ce%b1/ ``` In the onebox link, this gets double-encoded to this: ``` https://cerebrux.net/2019/05/28/brave-browser-%25ce%25bf-%25ce%25b4%25ce%25af%25ce%25ba%25ce%25b1%25ce%25b9%25ce%25bf%25cf%2582-browser-%25cf%2580%25ce%25bf%25cf%2585-%25cf%2580%25cf%2581%25ce%25bf%25cf%2583%25cf%2586%25ce%25ad%25cf%2581%25ce%25b5%25ce%25b9-%25cf%2584%25ce%25b1%25cf%2587%25cf%258d%25cf%2584%25ce%25b7%25cf%2584%25ce%25b1/ ``` The confusion may stem from the fact that Chrome and Firefox both _display_ URIs as IRIs, e.g. in the address bar, for ease of reading. ------------------------- Falco | 2019-06-01 14:38:32 UTC | #6 Oh I see now, so it's a report of a regression of https://meta.discourse.org/t/onebox-breaks-if-theres-chinese-text-in-url/67364/13?u=falco ? I'm confused, because the oneboxes in OP are working fine for me in both Firefox and Chrome... ------------------------- lionel-rowe | 2019-06-03 19:28:06 UTC | #7 [quote="Falco, post:6, topic:119254"] Oh I see now, so it’s a report of a regression of [Onebox breaks if there's chinese text in URL ](https://meta.discourse.org/t/onebox-breaks-if-theres-chinese-text-in-url/67364/13) ? [/quote] Looks like it. I guess the cooked versions are cached in that thread, but here's the same link: https://ru.wikipedia.org/wiki/%D0%A1%D0%B2%D0%BE%D0%B1%D0%BE%D0%B4%D0%BD%D0%BE%D0%B5_%D0%BF%D1%80%D0%BE%D0%B3%D1%80%D0%B0%D0%BC%D0%BC%D0%BD%D0%BE%D0%B5_%D0%BE%D0%B1%D0%B5%D1%81%D0%BF%D0%B5%D1%87%D0%B5%D0%BD%D0%B8%D0%B5 The onebox looks fine, but upon clicking the link, I get an error page ("Недопустимое название") in both Chromium 74 and Firefox 67 (Ubuntu 18.04.2). ------------------------- sam | 2019-06-06 07:26:45 UTC | #8 @tgxworld adding for your list to look at this, not a high priority but since you worked on the original makes sense you take this one. ------------------------- jtsagata | 2019-09-04 18:04:51 UTC | #9 Thanks for solving the problem in the latest version -------------------------