Skip to main content

Pandoc Filter Options

Pandoc converts documents from an input format to an output format via an abstract syntax tree (AST). It provides multiple ways to transform the AST using external programs, called “filters.” Lua filters are Lua programs that are interpreted within the Pandoc process, while JSON filters are arbitrary programs that are executed in a separate process with the AST communicated via standard I/O using JSON serialization. This blog entry describes a way to support filter options when implementing JSON filters in Haskell.

Pandoc provides ToJSONFilter, which takes care of JSON serialization and walking the AST, applying a given transformation function. The interface is quite versatile. Transformation functions can transform values of the following AST types purely (a -> a) or impurely (a -> IO a):

The current documentation states that Meta and MetaValue types are also supported, but I think that this is a mistake in the documentation. I submitted a pull request with a documentation fix, so I will soon find out if it is me that is mistaken.

Alternatively, transformation functions can transform values to a list of values (a -> [a] or a -> IO [a]). For example, an Inline Str may be parsed into more than one Inline value.

JSON filters can be used in two ways. When used explicitly, one can use the pandoc command to output the JSON AST and pipe it to the filter program. The filter program outputs the transformed JSON AST, which can be piped to a separate pandoc command that reads the AST and outputs the document in the target format. Alternatively, a single call to pandoc can use a --filter (or -F) option to specify the filter program. In this case, Pandoc handles the filter program execution and JSON AST pipes for you.

When using the --filter option, Pandoc passes the target format name as an argument to the filter program. ToJSONFilter supports transformation functions that have a Maybe Format argument for using this information. When the --filter option is used, a value is always passed. When running the filter program explicitly, the user may not pass a value, resulting in Nothing being passed to the transformation function.

When using Maybe Format, any additional arguments are ignored, but ToJSONFilter also supports transformation functions that have a [String] argument. All command-line arguments are passed to the transformation function in this case.

This functionality is nice to have when quickly writing a filter program, but it does not provide a good user interface. I prefer to parse options outside of the transformation function, and I also prefer to write programs that have --help documentation. Thankfully, this is easy to do by simply handling the arguments before calling toJSONFilter with the parsed options partially applied to the transformation function.

For example, a transformation function may have the following type:

foo :: SomeOption -> AnotherOption -> Maybe Format -> Inline -> Inline

The Main module can implement a full command-line interface, using optparse-applicative for example. The toJSONFilter function is used after the arguments have been parsed.

data Options
  = Options
    { someOption     :: !SomeOption
    , anotherOption  :: !AnotherOption
    , formatArgument :: !(Maybe String)
    }

getOptions :: IO Options
getOptions = ...

main :: IO ()
main = do
    opts <- getOptions
    toJSONFilter $
      foo
        (someOption opts)
        (anotherOption opts)
        (Format . T.pack <$> formatArgument opts)

If you would like to be able to use options even when using --filter, support can be added for setting options via environment variables. The above code is non-executable, but I hope to release a project soon that will serve as a concrete example of implementing a Pandoc filter with options that can be configured using command-line arguments as well as environment variables.

Author

Travis Cardwell

Published

Tags