Markdown links do not interpret parenthesis in URLs correctly

Continuing the discussion from URLs in parenthesis do not turn into links:

Section 2.3 of RFC2396 states that:

Unreserved characters can be escaped without changing the semantics of the URI, but this should not be done unless the URI is being used in a context that does not allow the unescaped character to appear.

Parenthesis are such characters. When pasted into Discourse, URLs using those characters, e.g., in query strings for advanced search in certain contexts, are misinterpreted as part of the Markdown URL syntax and rendered unusable.

For example, this URL would retrieve a list of legally deposited books published by my association:

https://www.depotlegal.be/Depot/form.aspx?SC=KBRVITRINE1#/Search/(query:(AdvancedQuery:(queryGroups:!((queryClauses:!((index:KBR264b_idx,logical:0,operator:0,otherValue:!n,value:‘petites%20singularitĂ©s’)

Although it works when pasted, as above, when used as a Markdown link, it does not work anymore:

[anchor](url) → [anchor](https://www.depotlegal.be/Depot/form.aspx?SC=KBRVITRINE1#/Search/(query:(AdvancedQuery:(queryGroups:!((queryClauses:!((index:KBR264b_idx,logical:0,operator:0,otherValue:!n,value:‘petites%20singularitĂ©s’))

Moreover, when such an URL is received by email, the result is:

Recherche avancée - Depot)))),ForceSearch:!t,Grid:!n,Page:0,PageRange:3,QueryString:!n,ResultSize:-1,ScenarioCode:KBRVITRINE1,SearchContext:1))

where “Recherche avancĂ©e - Depot” is the correctly interpreted link title (which is clickable), and the rest is garbage – and absent from the clickable link. (Email is set to receive in HTML) Rebuilding HTML does not fix the link.

This looks more like an issue with the URL and commonmark. If you balance the opening and closing parentheses by adding )))))) to the end, the markdown link is properly parsed.

link

[link](https://www.depotlegal.be/Depot/form.aspx?SC=KBRVITRINE1#/Search/(query:(AdvancedQuery:(queryGroups:!((queryClauses:!((index:KBR264b_idx,logical:0,operator:0,otherValue:!n,value:'petites%20singularit%C3%A9s'))))))))

This matches the behavior in the CommonMark Spec, which the markdown-it engine follows (used by Discourse).

a nonempty sequence of characters that does not start with <, does not include ASCII control characters or space character, and includes parentheses only if (a) they are backslash-escaped or (b) they are part of a balanced pair of unescaped parentheses. (Implementations may impose limits on parentheses nesting to avoid performance issues, but at least three levels of nesting should be supported.)

This can be tested also in the markdown-it demo.

2 Likes

The spec appears to be quite clear, the user may also escape the unbalanced parenthesis

[link](https://www.depotlegal.be/Depot/form.aspx?SC=KBRVITRINE1#/Search/\(query:\(AdvancedQuery:\(queryGroups:!\(\(queryClauses:!\(\(index:KBR264b_idx,logical:0,operator:0,otherValue:!n,value:'petites%20singularit%C3%A9s'\))

becomes

link

Since this is working as described in the spec I’m moving it to Feature discussion.

4 Likes