Links broken with (at least) two underscores in URL

i don’t know if this is an issue regarding Discourse exactly or if it’s coming from twitter API but I wanted to share this little bug, when we share a tweet from a account that has a underscore in its username, the link is broken on Discourse, here is an example :

https://twitter.com/_miss_ives_/status/923667180201414658
https://twitter.com/miss_ives/status/923667180201414658


Another example

https://twitter.com/_FARTIGAS_/status/914070638561767425
https://twitter.com/FARTIGAS/status/914070638561767425

when it’s not coming from twitter, a link with underscore works fine :

http://www.sqlite.org/lang_update.html

Put angle brackets on either side of the link. <like this>

That prevents oneboxing though. The link works, but it won’t onebox.
<https://twitter.com/_miss_ives_/status/923667180201414658>
https://twitter.com/_miss_ives_/status/923667180201414658

1 me gusta

Then it’s something @sam will have to add to his list for later. In the meantime, replace the underscore with the URL-encoded version of the character.

I’ll leave that as an excercise for the reader…

4 Me gusta

It’s pretty rare so that’s not a hurry at all. No worry

I tried with %5F and works perfectly, that’ll do it nicely for now.

Thanks!

5 Me gusta

@Vitaly Is this issue something you would like reported to markdown.it for linkify ?

( https://twitter.com/_miss_ives_/status/923667180201414658 not auto linking )

I am not sure we can even fix this properly cause we would have to push linkify forward in the pipeline?

Especially since this is default CommonMark http://spec.commonmark.org/dingus/?text=https%3A%2F%2Ftwitter.com%2F_miss_ives_%2Fstatus%2F923667180201414658 @codinghorror

That’s a known issue:

https://github.com/markdown-it/markdown-it/issues/38

It’s possible to fix, but not easy. Workaround available.

Correct solution is to make linkifier part of tokenizer process. That’s expensive (for example, email lookahead check for every character). Tradeoff is to listen : then do look behind for http(s), and lookahead for the rest. That’s not universal, but will cover all real cases:

  • http/https links will be parsed with other tokens, with higher priority than emphasis
  • everything else will be detected via text scan & regexps (as linkifier works now), probability of collision is very low.

I have no plans to do this, but if anyone wish to implement - see explanation above. Or use < > :slight_smile:

9 Me gusta

¿Podríamos solucionar esto haciendo el trabajo en nuestro manejador de pegar y, si estamos pegando una URL, codificar porcentualmente los guiones bajos en la cadena de consulta?

5 Me gusta

Jugar con el portapapeles siempre termina mal.

@Vitaly, ¿tienes curiosidad por si has tenido tiempo de ver este problema recientemente? En este caso, es el enlazador https://....

Supongo que conseguir el orden correcto en el motor y minimizar el coste es una pesadilla aquí.

1 me gusta

No estoy de acuerdo; https:// es un conjunto de caracteres TAN raro que creo que jugar con él suele ser bastante seguro.

(Excepto en los bloques de código, así que existe esto, pero si el portapapeles es SÓLO UNA URL, entonces es bastante seguro en mi opinión. Así que si hicieras una ancla de “empieza con https://” te garantizo al 99,99% que será seguro.)

1 me gusta