Twitter links broken with (at least) two underscores in username


#1

i don’t know if this is an issue regarding Discourse exactly or if it’s coming from twitter API but I wanted to share this little bug, when we share a tweet from a account that has a underscore in its username, the link is broken on Discourse, here is an example :

https://twitter.com/_miss_ives_/status/923667180201414658
https://twitter.com/miss_ives/status/923667180201414658


Another example

https://twitter.com/_FARTIGAS_/status/914070638561767425
https://twitter.com/FARTIGAS/status/914070638561767425

when it’s not coming from twitter, a link with underscore works fine :

http://www.sqlite.org/lang_update.html


(Jeff Atwood) #2

Put angle brackets on either side of the link. <like this>


(Joshua Rosenfeld) #3

That prevents oneboxing though. The link works, but it won’t onebox.
<https://twitter.com/_miss_ives_/status/923667180201414658>
https://twitter.com/_miss_ives_/status/923667180201414658


(Jeff Atwood) #4

Then it’s something @sam will have to add to his list for later. In the meantime, replace the underscore with the URL-encoded version of the character.

I’ll leave that as an excercise for the reader…


#5

It’s pretty rare so that’s not a hurry at all. No worry

I tried with %5F and works perfectly, that’ll do it nicely for now.

Thanks!


(Sam Saffron) #6

@Vitaly Is this issue something you would like reported to markdown.it for linkify ?

( https://twitter.com/_miss_ives_/status/923667180201414658 not auto linking )

I am not sure we can even fix this properly cause we would have to push linkify forward in the pipeline?

Especially since this is default CommonMark http://spec.commonmark.org/dingus/?text=https%3A%2F%2Ftwitter.com%2F_miss_ives_%2Fstatus%2F923667180201414658 @codinghorror


#7

That’s a known issue:

It’s possible to fix, but not easy. Workaround available.

Correct solution is to make linkifier part of tokenizer process. That’s expensive (for example, email lookahead check for every character). Tradeoff is to listen : then do look behind for http(s), and lookahead for the rest. That’s not universal, but will cover all real cases:

  • http/https links will be parsed with other tokens, with higher priority than emphasis
  • everything else will be detected via text scan & regexps (as linkifier works now), probability of collision is very low.

I have no plans to do this, but if anyone wish to implement - see explanation above. Or use < > :slight_smile: