Literate Haskell Markdown Headings

GHC has special support for literate programming. Haskell source code is usually written with documentation in comments, in files with a .hs extension. Literate Haskell source code gives documentation the leading role and prefixes code with a greater-than sign and space (>), in files with a .lhs extension. The documentation can be in any format, and Markdown is a popular choice. Unfortunately, there is a bug that causes problems when using common Markdown heading syntax.

Most Markdown software supports two kinds of heading syntax. The most popular is ATX-style headings, where one to six number signs (#) at the beginning of a line create level-one to level six headings, respectively. Authors can decide on the semantics of each level, and the following is one possible usage.

# Part

## Chapter

### Section

#### Subsection

##### Subsubsection

###### Subsubsubsection

Setext-style headings use underlines to create headings, where the underline character determines the heading level. The equal sign (=) is used for level-one headings, the dash (-) is used for level-two headings, and some Markdown software supports using the tilde (~) for level-three headings. There is no support for other levels. Authors can decide on the semantics of each level, and the following is one possible usage.

Part
====

Chapter
-------

Section
~~~~~~~

There are benefits and drawbacks to both styles, and selecting one is often a matter of personal preference. One must use ATX-style headings when many levels are needed, however. Note that it is possible to use both styles in the same document. For example, somebody who generally prefers setext-style headings may use ATX-style headings for subsections.

Part
====

Chapter
-------

Section
~~~~~~~

#### Subsection

##### Subsubsection

###### Subsubsubsection

GHC uses an unlit program to transform literate Haskell source code to normal Haskell source code that can be parsed by the compiler. Due to a bug, however, number signs at the beginning of the line are parsed as CPP (C pre-processor) syntax even though they are not marked as code.

Consider the following example literate Haskell program.

# Literate Haskell Example

Executables are implemented using a `Main` module that exposes a function
named `main`.

> module Main (main) where

The `main` function is run when the program is executed.

> main :: IO ()
> main = putStrLn "Hello!"

This simple example just prints "Hello!" to the screen.

Attempting to build this program results in the following error.

$ cabal build
...
Demo.lhs:1:1: error: parse error on input ‘#’
  |
1 | # Literate Haskell Example
  | ^
...

Any easy workaround is to use setext-style headings instead, but this does not work when you need to use levels that are not supported with that syntax. When you need ATX-style headings, one option is to use a custom syntax that is compatible with GHC and pre-process the file to transform the custom syntax to standard syntax when treating it as Markdown. For example, you could prefix ATX-style headings with a backslash as follows. This works fine with GHC, and the backslash can be removed when processing the file as Markdown.

\# Literate Haskell Example

LiterateX transforms literate source code to Markdown. It can be used to process literate Haskell code with Markdown-formatted documentation to Markdown files with the code in fenced code blocks. How might one add support for the above custom syntax? It is quite simple if all lines staring with a backslash and number signs are headings that should be transformed.

The literatex command-line utility is usually used as follows. The program determines the source format from the filename extension (.lhs) of the input file.

$ literatex -i Demo.lhs -o Demo.md

The sed utility can be used to transform the custom syntax as follows. Note that this transformation could be done on the input or output since LiterateX does not parse headings. It is done on the output in this example so that the source format can still be determined from the input filename.

$ literatex -i Demo.lhs | sed 's/^\\#/#/' > Demo.md

The headings as well as the code is transformed in the output.

# Literate Haskell Example

Executables are implemented using a `Main` module that exposes a function
named `main`.

``` {.haskell .numberSource startFrom="6"}
module Main (main) where
```

The `main` function is run when the program is executed.

``` {.haskell .numberSource startFrom="10"}
main :: IO ()
main = putStrLn "Hello!"
```

This simple example just prints "Hello!" to the screen.

When using the LiterateX library to render Markdown, perhaps as part of a static site generator, the custom syntax transformation can be implemented in Haskell. The LiterateX library exposes a low-level API that allows you to easily add pre-processors as well as post-processors. The following is a minimal example. A post-processor is used to be analogous to the above sed example.

unescapeMarkdownHeadings :: Monad m => C.ConduitT Text Text m ()
unescapeMarkdownHeadings = C.awaitForever $ \case
    line
      | "\\#" `T.isPrefixOf` line -> C.yield $ T.tail line
      | otherwise -> C.yield line

main :: IO ()
main =
    LiterateX.runResource
      SourceFormat.LiterateHaskell
      (Renderer.defaultOptionsFor "haskell")
      (LiterateX.sourceFile "Demo.lhs")
      (unescapeMarkdownHeadings .| LiterateX.sinkFile "Demo.md")

The full source code is available on GitHub.

As noted above, this simple transformation only works if all lines starting with a backslash and number signs are headings that should be transformed. It would incorrectly transform such a line in a code block, for example.

The following code demonstrates an example heading.

```markdown
\# Example
```

Correct transformation requires parsing the Markdown file. For example, Pandoc could be used. This is left as an exercise for to reader.

Author

Travis Cardwell

Published

March 21, 2023