Website Processor Design
The website software that I am developing features a modular system for processing source content to create HTML, CSS, and JavaScript output. This blog entry discusses the motivation for implementing such a feature as well as the current design.
FeedPipe Design
When designing FeedPipe, I tried to make it as simple as possible. While the design was inspired by the website software, it did not include the modular processor design. The software only supported generating HTML output from CommonMark markdown. There was no support for generating CSS or JavaScript, which needed to be prepared separately.
The benefit of this design is simplicity. The FeedPipe implementation was very straightforward. Using the cmark library, conversion from markdown to HTML is pure, which leads to a very pleasant implementation.
The drawback of this design is it significantly limits the features that can be supported. CommonMark supports basic markdown, but it does not support features such as syntax highlighting of code blocks, tables, mathematics, etc. Doing without such features may be an acceptable compromise for FeedPipe, but it is not for general website software.
Filesystem
The website software makes use of the source directory hierarchy to
determine the scope of configuration, including the usage of templates
and inclusion of CSS and JavaScript. Templates and source content for
CSS and JavaScript are included in the same directory hierarchy of
source content for HTML by putting them in special directories
(_templates
, _css
, and _js
). The
defaults for a site can be set by saving files at the top of the source
directory, and special configuration for specific parts of the website
can be set by saving files in appropriate subdirectories.
For example, consider the following (subset of) files in the source directory for a personal website:
src/
_index.md
_css/
site.sass
projects/
_index.md
hardware.md
software.md
_css/
projects.sass
This source content can be built in different ways according to the settings. By default, it creates a site with the following pages:
https://www.example.com/
https://www.example.com/projects
https://www.example.com/projects/hardware
https://www.example.com/projects/software
The CSS rendered from src/_css/site.sass
is linked to
from every page on the site, while the CSS rendered from
src/projects/_css/projects.sass
is only linked to from the
three projects
pages. Inclusion of JavaScript works the
same. (Note that templates behave differently, as templates in
subdirectories override the templates of parent directories.)
Metadata can be used to specify options. For example, here are some of the supported features:
- CSS stylesheets and JavaScript scripts are linked from the HTML pages by default, but it is also possible to embed them.
- It is possible to just save the CSS or JavaScript without automatically linking or embedding. Once can then specify the link in the metadata of specific pages. This is useful when the usage does not align with the directory hierarchy.
- It is possible to configure post-processors, which can be used to minimize/compress output content.
Processor
Files are processed by a processor according to the
extensions in the filename. The Processor
data type simply
associates a unique name with a function that does the processing. It is
used to process HTML, CSS, and JavaScript, so a phantom type is used to
help avoid using a processor for a different type of content than is
intended to be used with.
data Processor t
= Processor
name :: !Name
{ run :: FilePath -> A.Object -> Text -> IO (Either String Text)
, }
The first argument of the run
function is the path of
the file being processed. It is only used for providing context in error
messages.
The second argument is the metadata, represented as an Aeson object, and the third argument is the content of the file. Processors do not read from files directly because the metadata may be included in the content file itself, or it may be specified in a separate YAML file.
The return value is either a String
error or the
Text
result. The IO monad is used to support a wide variety
of processors. (The pure translation done by CommonMark is nice, but even Pandoc requires
IO.) While the website software generally separates IO from the business
logic, processors run in IO directly for simplicity.
Some processors are built in. The HTML.Raw
,
CSS.Raw
, and JS.Raw
processors pass the source
HTML, CSS, and JavaScript through without modification. The
HTML.CommonMark
processor translates CommonMark markdown to HTML, and the
HTML.Ginger
processor renders a Ginger template to
HTML using the metadata as the template context.
Some processors are provided by separate libraries. For example, a separate library provides processors that use Pandoc and LiterateX to render HTML. Users can implement their own libraries to add new functionality, but note that the website “builder” program must be re-complied to use such libraries directly.
Users can create custom processors without recompilation by
configuring a command to run. Note that this entails running an
additional process per file, so built-in processors tend to provide
better performance. Most commands can use standard IO and avoid writing
files, but some poorly-designed
CLIs require writing files. Processor commands therefore support
$INFILE
and $OUTFILE
variables, and the
translation is performed within a temporary directory using file IO when
necessary.
Each builder executable includes an index of built-in processors. Users can configure which processor to use to process files with specific extensions by referencing a built-in processor by name or providing the command for a custom processor. Defaults are provided, so such configuration is not required.
Note that a processor is selected based on all extensions of a
filename, not just the last extension. This is beneficial in the (rare)
case that a site needs to process the same format using different
processors. For example, a file with extensions .pandoc.md
could be processed using Pandoc while other
.md
files are processed using CommonMark.