Links broken with (at least) two underscores in URL

i don’t know if this is an issue regarding Discourse exactly or if it’s coming from twitter API but I wanted to share this little bug, when we share a tweet from a account that has a underscore in its username, the link is broken on Discourse, here is an example :

https://twitter.com/_miss_ives_/status/923667180201414658
https://twitter.com/miss_ives/status/923667180201414658


Another example

https://twitter.com/_FARTIGAS_/status/914070638561767425
https://twitter.com/FARTIGAS/status/914070638561767425

when it’s not coming from twitter, a link with underscore works fine :

http://www.sqlite.org/lang_update.html

Put angle brackets on either side of the link. <like this>

That prevents oneboxing though. The link works, but it won’t onebox.
<https://twitter.com/_miss_ives_/status/923667180201414658>
https://twitter.com/_miss_ives_/status/923667180201414658

1 Like

Then it’s something @sam will have to add to his list for later. In the meantime, replace the underscore with the URL-encoded version of the character.

I’ll leave that as an excercise for the reader…

4 Likes

It’s pretty rare so that’s not a hurry at all. No worry

I tried with %5F and works perfectly, that’ll do it nicely for now.

Thanks!

5 Likes

@Vitaly Is this issue something you would like reported to markdown.it for linkify ?

( https://twitter.com/_miss_ives_/status/923667180201414658 not auto linking )

I am not sure we can even fix this properly cause we would have to push linkify forward in the pipeline?

Especially since this is default CommonMark http://spec.commonmark.org/dingus/?text=https%3A%2F%2Ftwitter.com%2F_miss_ives_%2Fstatus%2F923667180201414658 @codinghorror

That’s a known issue:

https://github.com/markdown-it/markdown-it/issues/38

It’s possible to fix, but not easy. Workaround available.

Correct solution is to make linkifier part of tokenizer process. That’s expensive (for example, email lookahead check for every character). Tradeoff is to listen : then do look behind for http(s), and lookahead for the rest. That’s not universal, but will cover all real cases:

  • http/https links will be parsed with other tokens, with higher priority than emphasis
  • everything else will be detected via text scan & regexps (as linkifier works now), probability of collision is very low.

I have no plans to do this, but if anyone wish to implement - see explanation above. Or use < > :slight_smile:

9 Likes

Could we fix this by doing the work in our paste handler, and if we’re pasting a URL, percent-encode underscores in the query string?

5 Likes

Fiddling with clipboard is always something the will end in tears.

@Vitaly curious if you have had time to look at this issue recently? In this case it is the https://... linker.

I guess getting the ordering right in the engine and minimizing cost is a nightmare here.

1 Like

I disagree; https:// is SUCH a rare set of characters that I think fiddling is usually pretty safe.

(Except in code blocks, so there is this that, but if the clipboard is JUST AN URL, then it’s quite safe IMO. So if you did an anchor of “starts with https://” I can 99.99% guarantee it’ll be safe.)

1 Like