main ← fix-upload-label-escape-doubling
opened 05:00PM - 22 Apr 26 UTC
#39133 backslash-escaped markdown characters in upload filenames at
generation t…ime. That worked for freshly uploaded files but exposed a
latent bug in the rich-editor round-trip: markdown-it kept `\_` in the
parsed image token's content, ProseMirror stored it verbatim in the
`alt` attribute, and on save the serializer re-escaped each `\` to `\\`.
Every edit doubled the backslashes (1 → 2 → 4 → … → 2^N) until the post
exceeded `max_post_length` and became uneditable.
Escaping the raw is fundamentally fragile — nothing treats the stored
raw as canonical, so any parse/serialize cycle either drops the escape
or re-applies it. Fix it at parse time instead:
- Revert the generation-side escaping from #39133 in `UploadMarkdown`,
`uploads.js`, `inline_uploads.rb`, `to-markdown.js` and `sanitizeAlt`.
- Add a `literalize_upload_labels` core ruler in the markdown-it engine.
After inline parsing runs, it walks `image` / `link_open` tokens whose
URL starts with `upload://` and collapses their children into a single
literal text token, rebuilt from the children `content` plus the
`markup` of emphasis/strong/strikethrough delimiters. So `_foo_`,
`**foo**`, `~~foo~~`, `` `foo` ``, `\_foo`, linkified URLs, hashtags
and mentions inside upload labels all render literally. Reference-style
links (`[label][ref]` with `[ref]: upload://…`) get the same treatment
for free since they go through the same tokens.
The raw now stays canonical: the filename goes in verbatim, cooks the
same way on every pass, and the textarea and rich editor round-trip
identically.
Because escaping is gone, the structural characters `[`, `]` and `|`
(which would break the link/image syntax and can't be escaped without
reintroducing the doubling) are stripped from labels at every generation
point — `UploadMarkdown`, the HTML-anchor and hotlinked-image
conversions in `inline_uploads.rb`, `uploads.js` and `to-markdown.js`.
The multi-token scan-forward in `renderAttachment` (engine.js) and in
ProseMirror's `link.js` parser is kept: it still matters for non-upload
attachment links like `[**bold**|attachment](https://example.com/x.pdf)`,
where the label legitimately contains inline formatting the new ruler
doesn't touch.
`StripUploadLabelEscapes` (post-deploy migration) heals posts already
damaged by the regression, batching a scoped `regexp_replace` across the
`posts` table. The lookahead `(?=[^\]\[]*\]\(upload://)` bounds each
match to an upload label — forbidding `[`/`]` between the escape and the
closing `](upload://` keeps user-written `\_` escapes elsewhere in the
raw intact. Scoping on the lookahead alone, rather than anchoring on the
opening `[`, lets a single pass strip every escape in a label (e.g.
`foo\_bar\_baz`), not just the first.
https://meta.discourse.org/t/401231