Smart Punctuation without IP Symbols

Might it be possible to tweak the current configuration settings around Markdown typographer (also known as “smart punctuation”) to enable Unicode punctuation marks, like quotation marks and dashes, but disable (c), ™, and potentially other specialized glyphs of the kind?

I’m a lawyer. I advise on trademarks and copyrights all the time, including on Discourse forums. I have yet to advise anyone to revise a post to add a copyright or trademark symbol, and non-lawyers tend to massively overestimate how often they’re useful or necessary. On the flip side, I’ve edited more posts than I can count to try and trick the Markdown rendered into not rendering the © symbol.

This comes up especially often in enumerated lists. For example:

(a) apple
(b) banana
(c) copyright?
(d) date

This enumeration style is very common in laws, contracts, policies, and other formal writing influenced by legal style. It’s also a very common outline style. It was part of the “standard” outline style I was taught as a grade school student in the United States.

I see the current settings offer some flexibility around the punctuation, but not, as far as I can tell, any way to enable smart punctuation without enabling smart symbols.

9 Likes

I don’t have strong feelings around this one, I feel like “needing” to type the copyright symbol is so rare that this shortcut can be pulled … the balance of extremely rare usefulness versus the fairly common (but unsightly) list (a) (b) (c) format … I’m fine with removing this particular shortcut entirely @sam.

6 Likes

I’d also support just plain removing (c).

It’s not like it’s hard to search “copyright symbol”, copy, and paste. Nerds can type ©.

From a usability point of view, I’d have a list at all the expansions. I strongly suspect they’re more surprising than helpful.

5 Likes

Right, for this one, the benefit (teeeeeny tiiiny) appears to outweigh the risk (a fair bit!).

5 Likes

This is not looking that configurable, we would need to patch markdown.it to support this flexibility or just write our own prettify based off markdown.it.

https://github.com/discourse/discourse/blob/fce026d09be065f2116cbe67946a12f8b2f4f238/vendor/assets/javascripts/markdown-it.js#L4400-L4408

Only trivial fix is just to turn off the whole feature.

5 Likes

Huh odd that it isn’t configurable at all. That’s a shame. So

(c) (tm) (r) (p) → © ™ ® §

3 Likes

The upside is that the engine is pretty configurable, we could turn off the rules in markdown.it, copy the code and implement our own replacements just like @Roman_Rizzi implemented → .

A few days of engineering in cleaning that up, also would be a shame to ship the same code twice, but kind of unavoidable if we are forking it.

3 Likes

As of commit 7b8969ce5cb2edc54f2c1aa39a85a3a08076337d on markdown-it master, the relevant source file is lib/rules_core/replacements.js and the relevant test fixture is test/fixtures/markdown-it/typographer.txt.

All of the replacements are hard-coded. They include (c) → ©, (tm) → ™, (r) → ®, and (p) → §, which are grouped as “scoped abbreviations”.

For what it’s worth, I don’t get (p) for §. That should almost certainly be ℗, the phonorecord symbol, instead.

I am going to poke at this for a few and see if I can’t come up with a patch that makes the “typographer” feature more configurable.

7 Likes

This drives me nuts as well. I hit it at least weekly.

9 Likes

First, ancillary PR, fixing the interpretation of (p) and (P): https://github.com/markdown-it/markdown-it/pull/761

7 Likes

PR for switching off groups of replacements: https://github.com/markdown-it/markdown-it/pull/762

3 Likes

Maintainer’s highly responsive. He closes my PRs in a matter of minutes :stuck_out_tongue_winking_eye:

What I’m reading is that he’s very, very loathe to introduce any breaking changes, even in new major versions, even for things like (P) rendering as §. He’s also allergic to allowing typographer: {A: true, B: false}-style options, even where typographer: true remains functional, on account of perceived complexity.

I’m reading between lines, and he’s a Russian guy writing in English. But I get the feeling he sees markdown-it as baked.

With all gratitude, might be worth forking the impl, rewriting as ES Modules, and stripping out all the plugin-esque functionality that’s currently bundled, both linkification and replacements, including typographers’ punctuation replacements.

3 Likes

Maintainer’s not interested in points about duplicate code in bundles without numbers. And not interested in even non-breaking changes to API that aren’t “required by 100% of users” or blocking plugin impls. That leads to something of a bind, since they’re shipping two very plugin-y submodules, for linkification and smart punctuation, that should really be in their own small npm packages under that philosophy. It happens.

For context, there have been significant releases to markdown-it in the fairly recent past. Notably a perf improvement from Alex Kocharin in November of last year.

To fix (c), deal with (p), and do whatever else we want to do with arrows and the like, the best bet’s probably to float a patch removing the linkification and smart punctuation submodules from the core and loading them instead as plugins in Discourse. Use a GitHub Action or a cron job to keep an eye on markdown-it master and attempt automatic rebasing. If the maintainer remains this conservative about changes, the patch should apply cleanly for a good long time. Unless they make a big leap like rewriting as ES Modules, rather than CommonJS.

6 Likes

We discussed this internally and decided we are going to hard-fork typographer.

We will basically disable typographer in markdown.it and implement a copy in discourse. markdown.it is incredibly extensible, this is mostly “copy-and-paste”.

Once copy-and-paste is done we can add tests and customize and change some of the rules.

9 Likes

Hey, sorry to bring up this discussion. Since I’m looking for a way to disable ... -> …, I wanted to know if this feature somehow progressed, and if I can expect to somehow enable or disable individual rules in the future.

Thanks!

2 Likes

It is a site setting, just disable typographer

2 Likes

Well, I tried that already, but apparently it’s not working. No matter if I enable or disable the typographer, the substitutions are always done (and, by the way, (c) doesn’t seem to work)

2 Likes

You need to disable, then rebuild html on your desired posts

5 Likes

This is now very much complete :confetti_ball:

2 Likes