Onebox encodes single quotes in URL breaking the link

Onebox encodes the single quote (') to ' in URL, and in some cases breaks the link. Example:

Musical inspirations behind Doom's music - The Doom Wiki at DoomWiki.org

The encoding is done as preventive measure against XSS attacks. I am not sure if we should let single quotes pass through in URL as it is a rare edge case. Thoughts?

7 Likes

Do you think this change would be safe @sam?

We should follow the spec here with our encoding. Technically I think we got to allow stuff like ( and ) through cause wikipedia can use that. Even & is allowed according to spec eg: https://en.wikipedia.org/wiki/&

see: https://stackoverflow.com/a/4669755/17174

https://tools.ietf.org/html/rfc3986#section-3.3

I guess we need one rule for encoding the query params and another for encoding the path.

6 Likes

Yes this part is rather important @techAPJ

3 Likes

FWIW, percent encoding:

https://doomwiki.org/wiki/Musical_inspirations_behind_Doom%27s_music

Works fine (even though ' isn’t percent encoded per the spec). I’m not aware of a situation where HTML entities should be used anywhere in a URL, even in query params.

3 Likes

Entities are fine in the HTML. But something’s going wrong because it’s double encoding the quote.

<a href="https://doomwiki.org/wiki/Musical_inspirations_behind_Doom&amp;#39;s_music" target="_blank" rel="nofollow noopener">Musical inspirations behind Doom's music</a>

From the JSON.

Done in:

https://github.com/discourse/onebox/commit/7a1885c48800693d9d4abfa9c58d6d25fd19ca65


3 Likes