Onebox problem with i18n urls

Onebox have problem with some links from a wordpress site. The problem seems to have something with not latin characters in the url.

2 links to demonstrate the problem is this and this. I can follow the links in the inline version, but not from the onebox version. Same if i use a url shorten service.

The same links onebox version

1 Like

Those aren’t valid URLs.

They are Internationalized Resource Identifier - Wikipedia which don’t get the same treatment.


Valid or invalid many of the local sites use this form and that is outside our control. For example all of the Greek wikipedia

And as everything seems to work, and the onebox is created it was not detectable by the end user. Why not just pass the link as given, or the short url as given;

Is there any way to detect and don’t make onebox it such cases, or to disable onebox completly?

1 Like

Yes, trivially, just enter a space (or any other character or text) before the link.

You can also blacklist the domain from oneboxing in your site settings.


It doesn’t look like it. Raw:

In the onebox link, this gets double-encoded to this:

The confusion may stem from the fact that Chrome and Firefox both display URIs as IRIs, e.g. in the address bar, for ease of reading.


Oh I see now, so it’s a report of a regression of Onebox breaks if there's chinese text in URL ?

I’m confused, because the oneboxes in OP are working fine for me in both Firefox and Chrome…


Looks like it. I guess the cooked versions are cached in that thread, but here’s the same link:

The onebox looks fine, but upon clicking the link, I get an error page (“Недопустимое название”) in both Chromium 74 and Firefox 67 (Ubuntu 18.04.2).

1 Like

@tgxworld adding for your list to look at this, not a high priority but since you worked on the original makes sense you take this one.


Thanks for solving the problem in the latest version