main ← fix-upload-label-escape-doubling
opened 05:00PM - 22 Apr 26 UTC
in upload filenames at generation time. That worked for freshly uploaded files b…ut exposed a latent bug in the rich-editor round-trip: markdown-it kept `\_` in the parsed image token's content, ProseMirror stored it verbatim in the `alt` attribute, and on save the serializer re-escaped each `\` to `\\`. Every edit doubled the backslashes (1 → 2 → 4 → … → 2^N) until the post exceeded `max_post_length` and became uneditable.
**Approach change**
Escaping the raw is fundamentally fragile — nothing treats the stored raw as canonical, so any parse/serialize cycle either drops the escape or re-applies it. Fix it at parse time instead:
- Revert the generation-side escaping from #39133 in `UploadMarkdown`, `uploads.js`, `inline_uploads.rb`, `to-markdown.js`, and `sanitizeAlt`.
- Add a `literalize_upload_labels` core ruler in the markdown-it engine. After inline parsing runs, it walks `image` / `link_open` tokens whose URL starts with `upload://` and collapses their children into a single literal text token. The label text is reconstructed from children `content` plus the `markup` of emphasis/strong/strikethrough delimiters, so `_foo_`, `**foo**`, `~~foo~~`, `` `foo` ``, `\_foo`, linkified URLs, hashtags, mentions, and emoji shortcodes inside upload labels all render literally.
The raw now stays canonical — the filename goes in verbatim, cooks the same way on every pass, and both the textarea and rich editor round-trip identically. Reference-style links
(`[label][ref]` with `[ref]: upload://…`) get the same treatment for free since they go through the same `link_open`/`image` tokens.
**Parser resilience from #39133 is kept**
The multi-token scan-forward in `renderAttachment` (engine.js) and in ProseMirror's `link.js` parser still matters for non-upload attachment links like `[**bold**|attachment](https://example.com/file.pdf)`, where the label legitimately contains inline formatting that the new ruler doesn't touch.
**Data cleanup**
`StripUploadLabelEscapes` (post-deploy migration) heals posts already damaged by the regression with a single scoped `regexp_replace`. The lookahead `(?=[^\]]*\]\(upload://)` bounds the match to upload labels so user-written `\_` escapes elsewhere in the raw stay intact. Excluding `\` from the prefix char class keeps the greedy match from swallowing a backslash that belongs to the run being stripped.
https://meta.discourse.org/t/401231