Literate Haskell Markdown Headings
GHC has special support
for literate
programming. Haskell source code is usually written with
documentation in comments, in files with a .hs
extension.
Literate Haskell source code gives documentation the leading role and
prefixes code with a greater-than sign and space (>
), in
files with a .lhs
extension. The documentation can be in
any format, and Markdown is a popular
choice. Unfortunately, there is a bug that
causes problems when using common Markdown heading syntax.
Most Markdown software supports two kinds of heading syntax. The most
popular is ATX-style
headings, where one to six number signs (#
) at the
beginning of a line create level-one to level six headings,
respectively. Authors can decide on the semantics of each level, and the
following is one possible usage.
# Part
## Chapter
### Section
#### Subsection
##### Subsubsection
###### Subsubsubsection
Setext-style
headings use underlines to create headings, where the underline
character determines the heading level. The equal sign (=
)
is used for level-one headings, the dash (-
) is used for
level-two headings, and some Markdown software supports using the tilde
(~
) for level-three headings. There is no support for other
levels. Authors can decide on the semantics of each level, and the
following is one possible usage.
Part====
Chapter
-------
Section~~~~~~~
There are benefits and drawbacks to both styles, and selecting one is often a matter of personal preference. One must use ATX-style headings when many levels are needed, however. Note that it is possible to use both styles in the same document. For example, somebody who generally prefers setext-style headings may use ATX-style headings for subsections.
Part====
Chapter
-------
Section~~~~~~~
#### Subsection
##### Subsubsection
###### Subsubsubsection
GHC uses an unlit
program to transform literate Haskell source code to normal Haskell
source code that can be parsed by the compiler. Due to a bug,
however, number signs at the beginning of the line are parsed as CPP
(C pre-processor) syntax even though they are not marked as code.
Consider the following example literate Haskell program.
# Literate Haskell Example
Executables are implemented using a `Main` module that exposes a function
named `main`.
> module Main (main) where
The `main` function is run when the program is executed.
> main :: IO ()
> main = putStrLn "Hello!"
This simple example just prints "Hello!" to the screen.
Attempting to build this program results in the following error.
$ cabal build
...
Demo.lhs:1:1: error: parse error on input ‘#’
|
1 | # Literate Haskell Example
| ^
...
Any easy workaround is to use setext-style headings instead, but this does not work when you need to use levels that are not supported with that syntax. When you need ATX-style headings, one option is to use a custom syntax that is compatible with GHC and pre-process the file to transform the custom syntax to standard syntax when treating it as Markdown. For example, you could prefix ATX-style headings with a backslash as follows. This works fine with GHC, and the backslash can be removed when processing the file as Markdown.
\# Literate Haskell Example
LiterateX transforms literate source code to Markdown. It can be used to process literate Haskell code with Markdown-formatted documentation to Markdown files with the code in fenced code blocks. How might one add support for the above custom syntax? It is quite simple if all lines staring with a backslash and number signs are headings that should be transformed.
The literatex
command-line utility is usually used as follows. The program determines
the source format from the filename extension (.lhs
) of the
input file.
$ literatex -i Demo.lhs -o Demo.md
The sed
utility can be used to transform the custom syntax as follows. Note that
this transformation could be done on the input or output since LiterateX
does not parse headings. It is done on the output in this example so
that the source format can still be determined from the input
filename.
$ literatex -i Demo.lhs | sed 's/^\\#/#/' > Demo.md
The headings as well as the code is transformed in the output.
# Literate Haskell Example
`Main` module that exposes a function
Executables are implemented using a `main`.
named
``` {.haskell .numberSource startFrom="6"}
module Main (main) where
```
`main` function is run when the program is executed.
The
``` {.haskell .numberSource startFrom="10"}
main :: IO ()
main = putStrLn "Hello!"
```
This simple example just prints "Hello!" to the screen.
When using the LiterateX
library to render Markdown, perhaps as part of a static site
generator, the custom syntax transformation can be implemented in
Haskell. The LiterateX library exposes a low-level
API that allows you to easily add pre-processors as well as
post-processors. The following is a minimal example. A post-processor is
used to be analogous to the above sed
example.
unescapeMarkdownHeadings :: Monad m => C.ConduitT Text Text m ()
= C.awaitForever $ \case
unescapeMarkdownHeadings
line| "\\#" `T.isPrefixOf` line -> C.yield $ T.tail line
| otherwise -> C.yield line
main :: IO ()
=
main
LiterateX.runResourceSourceFormat.LiterateHaskell
"haskell")
(Renderer.defaultOptionsFor "Demo.lhs")
(LiterateX.sourceFile .| LiterateX.sinkFile "Demo.md") (unescapeMarkdownHeadings
The full source code is available on GitHub.
As noted above, this simple transformation only works if all lines starting with a backslash and number signs are headings that should be transformed. It would incorrectly transform such a line in a code block, for example.
The following code demonstrates an example heading.
```markdown
\# Example
```
Correct transformation requires parsing the Markdown file. For example, Pandoc could be used. This is left as an exercise for to reader.