Goldmark Markdown

Posted on  by 



So, despite saying I didn’t want to do this in my last post on this, I went and forked Hugo. Now rendering with KaTeX is way faster, and basically free when using hugo server.

  1. Goldmark Markdown Construction

This was more driven by curiosity than a desire to make things fast. At some point a couple weeks ago I was reminded of QuickJS, and from there it was a short series of small steps to my downfall. QuickJS really enabled this. It was easy to replace the javascript from my previous post with an executable that ran the qjs interpreter on KateX bytecode, and easy to then link it to a Haskell program that drove Pandoc as a library rather than using the command line. Having done that, getting Goldmark, Hugo’s markdown processor, to render TeX using KaTeX was also a small amount of work.

Rather than this post just being, “hey, I’m doing this now instead,” I guess I’ll talk a bit more about it.

Goldmark is fairly extensible so shoe-horning in TeX-awareness just means telling Goldmark’s parser to call our code when it hits a $: I put maths between $ and $$, so $x$ gets rendered as xx. But taking responsibility for parsing any markdown yourself means you have to think about weird edge cases1.

Since Hugo v0.60.0, the default Markdown-parser has switched to the CommonMark-compliant Goldmark. The previously default Markdown-parser Blackfriday is not compliant with any spec, and has many bugs (for which ox-hugo needs to keep adding workarounds). See these Hugo v0.60.0 release notes to learn more. So unless there’s a strong reason for an ox-hugo user to keep on using Blackfriday. However, in the latest Hugo v0.62.0, this doesn't work anymore as the default markdown renderer has changed to Goldmark, which is CommonMark compliant and allows for custom templates to render links and images from markdown. Katex Hugo goldmark Markdown Math For this blog, I am currently using Hugo v0.62.2, and since v0.60 Hugo by default, goldmark is used under the hood to render Markdown to HTML. Moreover, to write math I am now using Katex which is pretty fast compared to other math typesetting libraries!

Take links. Links in markdown have the following syntax: [foo](bar.tld). So, how should [$foo](bar.tld$) be parsed? In my opinion, there is a correct answer here: that’s a link. I’ve found people citing some old RFC that prohibits $ in urls, but they’re valid. Anyone who writes [$](en.wikipedia.org/wiki/$) wants that to be a link, and I can’t think of any exceptions.

Goldmark Markdown

By the way, that $ in the previous paragraph also needs to be parsed as a $, and not the opening dollar of maths.

Goldmark markdown for code

So, those are links. What about [$[0, 1]$](url)? There’s something satisfying about being able to link maths: [0,1][0,1]. So, that first example we want to not parse as TeX, and the second we do, and in order to support both we have to know if we are inside a link or not. Parsing!

Thankfully, we have a working TeX-aware markdown processor already: Pandoc. After fixing up some minor differences between how Goldmark and Pandoc renders the HTML, Pandoc, with the old filter from last time, can generate a bunch of test cases. I don’t follow Pandoc’s example in one place; Pandoc’s rule for allowing $[]$ inside links seems to be that the brackets must be balanced. I opted for requiring [ to appear before ] (which would otherwise close the link).

Next, some threading stuff. Hugo uses goroutines, and the QuickJS runtime can only be used single-threaded. We don’t want to make each goroutine queue to access QuickJS, and we also want to keep our changes to Hugo to a minimum. From a C perspective, there’s a really obvious solution: give each thread its own QuickJS runtime using thread-local storage. But goroutines aren’t threads, and the Go scheduler wants to schedule goroutines to any thread it likes. This means that between two calls into QuickJS the goroutine can move thread, and whatever way we have of communicating with the QuickJS runtime from Go needs to be okay with this happening.

What I decided to do was to just never allocate anything that would be passed back to Go, and have the Go code pass in memory instead. This makes the C code pure as far as Go is concerned, so we can use thread-local and not worry about the Go scheduler. It also means we don’t have to call free, which is always good.

Last time, I reported that a single page with TeX took half a second to render. Now, my entire site takes 240ms. Without KaTeX, it’s around 150ms. I still find this a bit slower than it really ought to be, but I suppose for how little work it was, it’s pretty good.

  1. The Commonmark spec is a great resource for all the markdown parsing gotchas you’ve never thought about before. ↩︎

More Posts

Goldmark
    Goldmark Markdown Construction
    • hugolib: Fix relative .Page.GetPage from bundle 196a9df5@bep#6705
    • markup/goldmark: Adjust auto ID space handling 9b6e6146@bep#6710
    • docs: Document the new autoHeadingIDType setting d62ede8e@bep#6707#6616
    • docs: Regenerate docshelper 81b7e48a@bep#6707#6616
    • markup/goldmark: Add an optional Blackfriday auto ID strategy 16e7c112@bep#6707
    • markup/goldmark: Make the autoID type config a string 8f071fc1@bep#6707
    • markup/goldmark: Simplify code 5ee1f087@bep
    • markup/goldmark: Make auto IDs GitHub compatible a82d2700@bep#6616




Coments are closed