Website Processor Design

The website software that I am developing features a modular system for processing source content to create HTML, CSS, and JavaScript output. This blog entry discusses the motivation for implementing such a feature as well as the current design.

FeedPipe Design

When designing FeedPipe, I tried to make it as simple as possible. While the design was inspired by the website software, it did not include the modular processor design. The software only supported generating HTML output from CommonMark markdown. There was no support for generating CSS or JavaScript, which needed to be prepared separately.

The benefit of this design is simplicity. The FeedPipe implementation was very straightforward. Using the cmark library, conversion from markdown to HTML is pure, which leads to a very pleasant implementation.

The drawback of this design is it significantly limits the features that can be supported. CommonMark supports basic markdown, but it does not support features such as syntax highlighting of code blocks, tables, mathematics, etc. Doing without such features may be an acceptable compromise for FeedPipe, but it is not for general website software.

Filesystem

The website software makes use of the source directory hierarchy to determine the scope of configuration, including the usage of templates and inclusion of CSS and JavaScript. Templates and source content for CSS and JavaScript are included in the same directory hierarchy of source content for HTML by putting them in special directories (_templates, _css, and _js). The defaults for a site can be set by saving files at the top of the source directory, and special configuration for specific parts of the website can be set by saving files in appropriate subdirectories.

For example, consider the following (subset of) files in the source directory for a personal website:

src/
- _index.md
- _css/
  - site.sass
- projects/
  - _index.md
  - hardware.md
  - software.md
  - _css/
    - projects.sass

This source content can be built in different ways according to the settings. By default, it creates a site with the following pages:

https://www.example.com/
https://www.example.com/projects
https://www.example.com/projects/hardware
https://www.example.com/projects/software

The CSS rendered from src/_css/site.sass is linked to from every page on the site, while the CSS rendered from src/projects/_css/projects.sass is only linked to from the three projects pages. Inclusion of JavaScript works the same. (Note that templates behave differently, as templates in subdirectories override the templates of parent directories.)

Metadata can be used to specify options. For example, here are some of the supported features:

CSS stylesheets and JavaScript scripts are linked from the HTML pages by default, but it is also possible to embed them.
It is possible to just save the CSS or JavaScript without automatically linking or embedding. Once can then specify the link in the metadata of specific pages. This is useful when the usage does not align with the directory hierarchy.
It is possible to configure post-processors, which can be used to minimize/compress output content.

Processor

Files are processed by a processor according to the extensions in the filename. The Processor data type simply associates a unique name with a function that does the processing. It is used to process HTML, CSS, and JavaScript, so a phantom type is used to help avoid using a processor for a different type of content than is intended to be used with.

data Processor t
  = Processor
    { name :: !Name
    , run  :: FilePath -> A.Object -> Text -> IO (Either String Text)
    }

The first argument of the run function is the path of the file being processed. It is only used for providing context in error messages.

The second argument is the metadata, represented as an Aeson object, and the third argument is the content of the file. Processors do not read from files directly because the metadata may be included in the content file itself, or it may be specified in a separate YAML file.

The return value is either a String error or the Text result. The IO monad is used to support a wide variety of processors. (The pure translation done by CommonMark is nice, but even Pandoc requires IO.) While the website software generally separates IO from the business logic, processors run in IO directly for simplicity.

Some processors are built in. The HTML.Raw, CSS.Raw, and JS.Raw processors pass the source HTML, CSS, and JavaScript through without modification. The HTML.CommonMark processor translates CommonMark markdown to HTML, and the HTML.Ginger processor renders a Ginger template to HTML using the metadata as the template context.

Some processors are provided by separate libraries. For example, a separate library provides processors that use Pandoc and LiterateX to render HTML. Users can implement their own libraries to add new functionality, but note that the website “builder” program must be re-complied to use such libraries directly.

Users can create custom processors without recompilation by configuring a command to run. Note that this entails running an additional process per file, so built-in processors tend to provide better performance. Most commands can use standard IO and avoid writing files, but some poorly-designed CLIs require writing files. Processor commands therefore support $INFILE and $OUTFILE variables, and the translation is performed within a temporary directory using file IO when necessary.

Each builder executable includes an index of built-in processors. Users can configure which processor to use to process files with specific extensions by referencing a built-in processor by name or providing the command for a custom processor. Defaults are provided, so such configuration is not required.

Note that a processor is selected based on all extensions of a filename, not just the last extension. This is beneficial in the (rare) case that a site needs to process the same format using different processors. For example, a file with extensions .pandoc.md could be processed using Pandoc while other .md files are processed using CommonMark.

Author

Travis Cardwell

Published

October 14, 2021