Discourse has a new Markdown Parser!

markdown

(Robin Ward) #1

Earlier this week, we committed a new text parser to Discourse. Previously, we used the parser in Pagedown to perform our markup. Its support for Markdown was quite good, but over time our extensions to it were becoming unmanageable.

Pagedown

In terms of extensibility, Pagedown has hooks, where you can say “before formatting this block of text, apply this regular expression.” This works pretty well until you want some kind of advanced formatting that regular expressions don’t handle well.

In particular, if your formatting rules rely on the layout of the resulting HTML of previous operations, you will end up parsing HTML using regular expressions. And as we all know that’s a bad idea!

Additionally, we had certain blocks of text where we didn’t want formatting applied until later in the pipeline. So in that case, we actually hoisted them out into a map and put them back later in the process. It was super ugly.

markdown-js

Our new parser is based on markdown-js. markdown-js is the most flexible Markdown parser in Javascript I’ve found yet.

The coolest thing about it is that instead of emitting HTML strings during the parsing phase, it produces an intermediate representation in JsonML. A tree of arrays is much easier to work with than HTML if you want to do post processing.

We no longer have regular expressions that work on HTML in our pipeline. We no longer hoist out parts of the document for processing later. Finally, we shaved off roughly 1,000 lines of Javascript code!

An inline example

Let’s say you want to replace all occurances of “evil trout” with a link that says “EVIL TROUT IS AWESOME”:

Discourse.Dialect.on("register", function(event) {
  var dialect = event.dialect;

  dialect.inline["evil trout"] = function(text) {
    return [
             "evil trout".length, 
             [
               'a', 
               {href: "http://eviltrout.com"}, 
               "EVIL TROUT IS AWESOME"
             ] 
           ];
  };
});

If you’ve worked with our parser before, you’ll notice the new one is all built around registering handlers as plugins. We use RSVP.js to do that. The first line and closure just tells Discourse that you want to register an extension to the dialect. The second line retrieves a reference to the dialect so we can extend it.

To match any inline occurrences of “evil trout” we just assign a function to dialect.inline["evil trout"]. The function will be called any time that text is found. (Note: you’re free to match on smaller samples of text than you need. You’ll be passed a fragment of text where that match existed. You can then perform a regular expression on it if you want.)

In this case, we want to just return a link to eviltrout.com in JsonML. The formatting for that looks like this:

[
 'a',
 {href: "http://eviltrout.com"}, 
 "EVIL TROUT IS AWESOME"
]

There’s just one last thing we have to do. An inline formatter will always be passed a string beginning with the match you wanted. You need to tell markdown-js how many characters to replace in the string with your JsonML fragment.

In this case it’s easy, it’s just the length of “evil trout”. So the final return statement is:

return [
         "evil trout".length, 
         [
           'a', 
           {href: "http://eviltrout.com"}, 
           "EVIL TROUT IS AWESOME"
         ] 
       ];

This is a pretty simple example, and it does get more complicated as you add more functionality. You should check out the dialects folder in our source tree to see how other dialects are implemented if you want to try your hand at more complex examples.

Please let us know if you encounter any issues with the new parser or have suggestions for how to extend it to make Discourse more awesome!