Links broken with (at least) two underscores in URL

Steven · 28 Octubre, 2017 14:57

i don’t know if this is an issue regarding Discourse exactly or if it’s coming from twitter API but I wanted to share this little bug, when we share a tweet from a account that has a underscore in its username, the link is broken on Discourse, here is an example :

https://twitter.com/_miss_ives_/status/923667180201414658
https://twitter.com/_miss_ives_/status/923667180201414658

Another example

https://twitter.com/_FARTIGAS_/status/914070638561767425
https://twitter.com/_FARTIGAS_/status/914070638561767425

when it’s not coming from twitter, a link with underscore works fine :

codinghorror · 28 Octubre, 2017 19:47

Put angle brackets on either side of the link. <like this>

jomaxro · 28 Octubre, 2017 20:01

That prevents oneboxing though. The link works, but it won’t onebox.
<https://twitter.com/_miss_ives_/status/923667180201414658>
https://twitter.com/_miss_ives_/status/923667180201414658

codinghorror · 28 Octubre, 2017 20:21

Then it’s something @sam will have to add to his list for later. In the meantime, replace the underscore with the URL-encoded version of the character.

I’ll leave that as an excercise for the reader..

Steven · 28 Octubre, 2017 21:02

It’s pretty rare so that’s not a hurry at all. No worry

I tried with %5F and works perfectly, that’ll do it nicely for now.

Thanks!

sam · 28 Octubre, 2017 22:06

@Vitaly Is this issue something you would like reported to markdown.it for linkify ?

( https://twitter.com/_miss_ives_/status/923667180201414658 not auto linking )

I am not sure we can even fix this properly cause we would have to push linkify forward in the pipeline?

Especially since this is default CommonMark http://spec.commonmark.org/dingus/?text=https%3A%2F%2Ftwitter.com%2F_miss_ives_%2Fstatus%2F923667180201414658 @codinghorror

Vitaly · 28 Octubre, 2017 23:28

That’s a known issue:

https://github.com/markdown-it/markdown-it/issues/38

It’s possible to fix, but not easy. Workaround available.

Correct solution is to make linkifier part of tokenizer process. That’s expensive (for example, email lookahead check for every character). Tradeoff is to listen : then do look behind for http(s), and lookahead for the rest. That’s not universal, but will cover all real cases:

http/https links will be parsed with other tokens, with higher priority than emphasis
everything else will be detected via text scan & regexps (as linkifier works now), probability of collision is very low.

I have no plans to do this, but if anyone wish to implement - see explanation above. Or use < >

supermathie · 8 Junio, 2022 14:19

¿Podríamos solucionar esto haciendo el trabajo en nuestro manejador de pegar y, si estamos pegando una URL, codificar porcentualmente los guiones bajos en la cadena de consulta?

sam · 8 Junio, 2022 23:49

Jugar con el portapapeles siempre termina mal.

@Vitaly, ¿tienes curiosidad por si has tenido tiempo de ver este problema recientemente? En este caso, es el enlazador https://....

Supongo que conseguir el orden correcto en el motor y minimizar el coste es una pesadilla aquí.

codinghorror · 15 Junio, 2022 18:38

No estoy de acuerdo; https:// es un conjunto de caracteres TAN raro que creo que jugar con él suele ser bastante seguro.

(Excepto en los bloques de código, así que existe esto, pero si el portapapeles es SÓLO UNA URL, entonces es bastante seguro en mi opinión. Así que si hicieras una ancla de “empieza con https://” te garantizo al 99,99% que será seguro.)

Tema		Respuestas	Vistas
Underscores in a URL are being interpreted as Markdown, the link fails to work Bug	1	786	2 Noviembre 2019
Some links are misinterpreted Bug	6	784	8 Agosto 2022
Issue with the handling of underscore characters in URL querystrings Bug	3	913	8 Junio 2022
Links with underbars surrounded by hyphens render incorrectly Bug	1	789	24 Septiembre 2018
A comma followed by an underscore in a URL results in defunct link Bug markdown-it-review	5	2896	26 Junio 2017

Links broken with (at least) two underscores in URL

Temas relacionados