Data types à la carte
I am reviewing (making sure I understand) some classic Haskell papers. The first up is Wouter Swierstra’s Data types à la carte (CiteSeer), first published in 2008. Inspired by Sandy Maguire, I am writing this blog entry as part of my review.
This blog entry is written in Literate Haskell, using LiterateX to render the HTML. The source code is available on GitHub. Note that the code is in the order used in the paper.
Setup
This implementation uses the following GHC extensions.
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE InstanceSigs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeOperators #-}
It is implemented as a library so that it is easy to experiment with
using GHCi. As a convention, functions with names starting with
main
have type IO ()
, providing an easy way to
run various tests from within GHCi.
module DTALC where
The following imports are used.
-- https://hackage.haskell.org/package/base
import Control.Applicative ((<|>))
import qualified Prelude
import Prelude hiding (getChar, putChar, readFile, writeFile)
Fixing the expression problem
This section shows how to represent expressions without using a sum type that specifies all valid constructors. Being able to separate/modularize the parts of an expression resolves the famous “expression problem.”
The following defines a recursive data type to represent expressions.
Type parameter f
specifies the type constructor(s) that can
be used in an expression. It takes a type parameter that specifies the
type of the expression (Expr f
), used in the recursive
parts.
newtype Expr f = In (f (Expr f))
Perhaps it is worthwhile to compare this definition to the
Expr
definition in the paper introduction. In the
introduction, Expr
is a sum type that specifies all valid
(value) constructors. The Add
constructor has recursive
arguments, allowing the addition of expressions (not just values). In
the above definition, type f
can be used to specify all
valid types, using separate types instead of a single sum type.
On the right-hand-side, f (Expr f)
specifies that a value
of Expr f
is a value of one of the types specified by
“signature” f
, with recursive parts of type
Expr f
. A newtype
is required for the type
system, using In
to construct values of
Expr
.
The Val
and Add
types are given as initial
examples. The IntExpr
and AddExpr
shows how
these can be used with the above Expr
type, but note that
they are not otherwise used.
newtype Val e = Val Int
type IntExpr = Expr Val
data Add e = Add e e
type AddExpr = Expr Add
The following defines a type-level operator that represents the coproduct of two signatures. It requires the TypeOperators extension. It is used like a list constructed from cons pairs, with left cells indicating type constructors and right cells pointing to the next pair, except that the chain ends with the final right cell pointing to a type constructor instead of null. The operator is marked as right-associative so that these chains are constructed correctly. There are better ways to implement this, but they are not discussed in this paper.
data (f :+: g) e = Inl (f e) | Inr (g e)
infixr 8 :+:
This type operator is infix, but it has an additional type parameter
e
. Note that (f :+: g) e
could alternatively
be written (:+:) f g e
.
The following addExample
definition illustrates how bad
it would be to create expressions using just the above definitions.
addExample :: Expr (Val :+: Add)
= In (Inr (Add (In (Inl (Val 118))) (In (Inl (Val 1219))))) addExample
Evaluation
This section shows how to evaluate expressions.
The Functor
definitions are straightforward. Note that I use the InstanceSigs
extension to explicitly write the specialized signatures for all
instance methods in this implementation. Though it is likely unhelpful
in this case, such explicit signatures can help understand more complex
definitions. I also avoid deriving any instances so that the definitions
are explicit.
instance Functor Val where
fmap :: (a -> b) -> Val a -> Val b
fmap _f (Val x) = Val x
instance Functor Add where
fmap :: (a -> b) -> Add a -> Add b
fmap f (Add x y) = Add (f x) (f y)
instance (Functor f, Functor g) => Functor (f :+: g) where
fmap :: (a -> b) -> (f :+: g) a -> (f :+: g) b
fmap f = \case
Inl e -> Inl (fmap f e)
Inr e -> Inr (fmap f e)
Note that there are a few (unimportant) stylistic changes in the
above code. I use _f
in the Val
instance to
indicate that the function is not used in the definition. I also use the
LambdaCase
extension.
The foldExpr
function folds an expression using the
provided “algebra.” It uses the Functor
instances to
traverse the recursive parts. Perhaps due to my Scheme
background, I prefer to use a helper function for the recursion, with
the first argument in scope. Since the type signature for the helper
function references type variables in the type signature of
foldExpr
, the ScopedTypeVariables
extension is required.
foldExpr :: forall f a. Functor f => (f a -> a) -> Expr f -> a
= go
foldExpr f where
go :: Expr f -> a
In t) = f (fmap go t) go (
An algebra is defined for evaluating expressions. The use of a type class allows evaluation of each part of an expression to be defined separately.
class Functor f => Eval f where
evalAlgebra :: f Int -> Int
The instances are straighforward. The type signatures clearly show
that the type parameters are Int
when performing evaluation
(not Expr f
). This illustrates the versatility of
the abstract definitions.
instance Eval Val where
evalAlgebra :: Val Int -> Int
Val x) = x
evalAlgebra (
instance Eval Add where
evalAlgebra :: Add Int -> Int
Add x y) = x + y
evalAlgebra (
instance (Eval f, Eval g) => Eval (f :+: g) where
evalAlgebra :: (f :+: g) Int -> Int
= \case
evalAlgebra Inl x -> evalAlgebra x
Inr y -> evalAlgebra y
The eval
function implements evaluation using
foldExpr
and evalAlgebra
.
eval :: Eval f => Expr f -> Int
= foldExpr evalAlgebra eval
The example evaluates addExample
.
mainEvalAddExample :: IO ()
= print $ eval addExample mainEvalAddExample
Automating injections
This section shows how to avoid having to manually write out all of
the constructors like was done in addExample
. A type class
is used to define a constraint that asserts that a specific type is a
member of a signature. The inj
method represents injection, while the
prj
method discussed at the end of the next section
represents the partial inverse.
class (Functor sub, Functor sup) => sub :<: sup where
inj :: sub a -> sup a
prj :: sup a -> Maybe (sub a)
Since this class has two parameters, the MultiParamTypeClasses extension is required.
The instances are very straightforward. Note that the FlexibleInstances
extension is required because f
is not a distinct type
variable. The paper notes the overlapping instances, and the final
instance needs to be annotated with OVERLAPPABLE
to
compile.
instance Functor f => f :<: f where
inj :: f a -> f a
= id
inj
prj :: f a -> Maybe (f a)
= Just
prj
instance (Functor f, Functor g) => f :<: (f :+: g) where
inj :: f a -> (f :+: g) a
= Inl
inj
prj :: (f :+: g) a -> Maybe (f a)
= \case
prj Inl e -> Just e
Inr _e -> Nothing
instance {-# OVERLAPPABLE #-}
Functor f, Functor g, Functor h, f :<: g) => f :<: (h :+: g) where
( inj :: f a -> (h :+: g) a
= Inr . inj
inj
prj :: (h :+: g) a -> Maybe (f a)
= \case
prj Inl _e -> Nothing
Inr e -> prj e
The smart constructors are implemented using an inject
helper function. The FlexibleContexts
extension is required because non-type-variable arguments are used in
the constraint. Note that I implement the smart constructor for
Add
as %+%
instead of using Unicode.
inject :: (g :<: f) => g (Expr f) -> Expr f
= In . inj
inject
val :: (Val :<: f) => Int -> Expr f
= inject (Val x)
val x
(%+%) :: (Add :<: f) => Expr f -> Expr f -> Expr f
%+% y = inject (Add x y)
x infixl 6 %+%
These smart constructors are much more convenient than the
constructors used in addExample
!
addExample2 :: Expr (Val :+: Add)
= val 30000 %+% val 1330 %+% val 7
addExample2
mainEvalAddExample2 :: IO ()
= print $ eval addExample2 mainEvalAddExample2
Examples
This section shows how new expression terms and functions can be added without having to change the implementation of existing terms and functions.
Multiplication is implemented as Mul
. Note that I
implemented the smart constructor for Mul
as
%*%
instead of using Unicode.
data Mul e = Mul e e
instance Functor Mul where
fmap :: (a -> b) -> Mul a -> Mul b
fmap f (Mul x y) = Mul (f x) (f y)
instance Eval Mul where
evalAlgebra :: Mul Int -> Int
Mul x y) = x * y
evalAlgebra (
(%*%) :: (Mul :<: f) => Expr f -> Expr f -> Expr f
%*% y = inject (Mul x y)
x infixl 7 %*%
The examples demonstrate how this new term can be used with the existing terms.
mulExample :: Expr (Val :+: Add :+: Mul)
= val 80 %*% val 5 %+% val 4
mulExample
mainEvalMulExample :: IO ()
= print $ eval mulExample
mainEvalMulExample
mulExample2 :: Expr (Val :+: Mul)
= val 6 %*% val 7
mulExample2
mainEvalMulExample2 :: IO ()
= print $ eval mulExample2 mainEvalMulExample2
Pretty printing is implemented using the Render
type
class.
class Render f where
render :: Render g => f (Expr g) -> String
pretty :: Render f => Expr f -> String
In t) = render t pretty (
Like evaluation, instances are defined for each term as well as
(:+:)
.
instance Render Val where
render :: Render g => Val (Expr g) -> String
Val i) = show i
render (
instance Render Add where
render :: Render g => Add (Expr g) -> String
Add x y) = "(" ++ pretty x ++ " + " ++ pretty y ++ ")"
render (
instance Render Mul where
render :: Render g => Mul (Expr g) -> String
Mul x y) = "(" ++ pretty x ++ " * " ++ pretty y ++ ")"
render (
instance (Render f, Render g) => Render (f :+: g) where
render :: Render h => (f :+: g) (Expr h) -> String
= \case
render Inl x -> render x
Inr y -> render y
I implement a Show
instance for convenience.
instance Render f => Show (Expr f) where
show :: Expr f -> String
show = pretty
The example pretty-prints mulExample
.
mainPrettyMulExample :: IO ()
= print mulExample mainPrettyMulExample
I think that the discussion about prj
is very important.
When using a sum type, one can pattern-match against all of the
constructors of that type, but this is not possible using
Expr
. This discussion shows how to use prj
to
match constructors, specifying the appropriate type constraints. The
match
function simply unwraps the Expr
constructor and calls prj
on the wrapped value.
match :: (g :<: f) => Expr f -> Maybe (g (Expr f))
In t) = prj t match (
The distr
function applies the distributive
law on the outermost constructors of an expression. It is used for
term
rewriting. As shown, however, it only works when the right
multiplicand is an addition. It does not work when the left multiplicand
is an addition. The following version supports both.
distr :: forall f. (Add :<: f, Mul :<: f) => Expr f -> Maybe (Expr f)
= distrL <|> distrR
distr t where
distrL :: Maybe (Expr f)
= do
distrL Mul a b <- match t
Add c d <- match b
return (a %*% c %+% a %*% d)
distrR :: Maybe (Expr f)
= do
distrR Mul a b <- match t
Add c d <- match a
return (c %*% b %+% d %*% b)
The paper states that an algebra can be defined to fold over an
expression to uniformly apply the distributive law using
distr
, but no code is provided. I tried implementing this
myself, but I do not yet know how to write an algebra (type
f a -> a
) to do this. A type class should not be
required, as no new functionality needs to be implemented per term.
Also, it cannot be done in a single pass because one rewrite may enable
further rewrites in sub-expressions.
I tried implementing this using the following function,
which uses mutually-recursive helper functions. The go
function traverses (sub-)expressions, possibly rewriting them from the
buttom up using the f'
function. The f'
function attempts to rewrite an expression using the passed rewrite
function. If the expression is rewritten, go
is called on
the result so that any newly enabled rewrites in sub-expressions can be
performed.
rewriteExpr :: forall f. Functor f
=> (Expr f -> Maybe (Expr f)) -> Expr f -> Expr f
= go
rewriteExpr f where
go :: Expr f -> Expr f
In t) = f' (In (fmap go t))
go (
f' :: Expr f -> Expr f
= maybe e go (f e) f' e
This function works fine when using GHC 8.10.7, but compilation fails
when using GHC 9.0.2 with a “simplifier ticks exhausted.” The error
persists when I replace f'
with id
, so the
problem is with the recursion in the go
helper function. It
is very similar to the foldExpr
definition, except that it
returns expressions instead of folded values. Simplifying, even
definition
rewriteExpr f (In t) = In (fmap (rewriteExpr f) t)
causes
the error.
Using GHC 8.10.7, the rewriteDistr
function is a helper
function that uses rewriteExpr
to rewrite an expression
using just the distr
function.
rewriteDistr :: (Add :<: f, Mul :<: f) => Expr f -> Expr f
= rewriteExpr distr rewriteDistr
The following are examples of incresting complexity.
distrExample1 :: Expr (Val :+: Add :+: Mul)
= val 2 %*% (val 3 %+% val 4)
distrExample1
mainDistrExample1 :: IO ()
= do
mainDistrExample1 print distrExample1
print $ rewriteDistr distrExample1
distrExample2 :: Expr (Val :+: Add :+: Mul)
= (val 2 %+% val 3) %*% val 4
distrExample2
mainDistrExample2 :: IO ()
= do
mainDistrExample2 print distrExample2
print $ rewriteDistr distrExample2
distrExample3 :: Expr (Val :+: Add :+: Mul)
= val 2 %*% ((val 3 %+% val 4) %*% val 5)
distrExample3
mainDistrExample3 :: IO ()
= do
mainDistrExample3 print distrExample3
print $ rewriteDistr distrExample3
distrExample4 :: Expr (Val :+: Add :+: Mul)
= (val 2 %+% val 3) %*% (val 4 %*% (val 5 %+% val 6))
distrExample4
mainDistrExample4 :: IO ()
= do
mainDistrExample4 print distrExample4
print $ rewriteDistr distrExample4
distrExample5 :: Expr (Val :+: Add :+: Mul)
=
distrExample5 2 %*% (val 3 %+% val 4)) %*%
(val 5 %+% val 6) %*% (val 7 %*% (val 8 %+% val 9)))
((val
mainDistrExample5 :: IO ()
= do
mainDistrExample5 print distrExample5
print $ rewriteDistr distrExample5
All of the rewritten expressions are indeed in disjunctive
normal form, but I am very interested in finding a better solution.
If you know of a different solution or how to fix
rewriteExpr
to work with GHC 9, please let me know!
Monads for free
This section shows how a free monad is implemented in Haskell. It is used to implement a calculator memory cell.
A free monad is implemented as type Term
. Note that an
Applicative
instance is now required for defining a Monad
instance.
data Term f a
= Pure a
| Impure (f (Term f a))
instance Functor f => Functor (Term f) where
fmap :: (a -> b) -> Term f a -> Term f b
fmap f = \case
Pure x -> Pure (f x)
Impure t -> Impure (fmap (fmap f) t)
instance Functor f => Applicative (Term f) where
pure :: a -> Term f a
pure x = Pure x
(<*>) :: Term f (a -> b) -> Term f a -> Term f b
Pure f <*> t = fmap f t
Impure f <*> t = Impure (fmap (<*> t) f)
instance Functor f => Monad (Term f) where
return :: a -> Term f a
return x = Pure x
(>>=) :: Term f a -> (a -> Term f b) -> Term f b
Pure x >>= f = f x
Impure t >>= f = Impure (fmap (>>= f) t)
The following examples are not used in the implementation.
data Zero a
data One a = One
newtype Const e a = Const e
The Incr
and Recall
types represent the
“increment” and “recall” operations. Note that “increment” is not used
to mean the addition of one; it represents the addition of an arbitrary
value. These types have straightforward Functor
instances.
data Incr t = Incr Int t
instance Functor Incr where
fmap :: (a -> b) -> Incr a -> Incr b
fmap f (Incr i g) = Incr i (f g)
newtype Recall t = Recall (Int -> t)
instance Functor Recall where
fmap :: (a -> b) -> Recall a -> Recall b
fmap f (Recall g) = Recall (f . g)
The smart constructors are implemented using an inject
helper function, which I name injectTerm
because there is
already an inject
function in this module.
injectTerm :: (g :<: f) => g (Term f a) -> Term f a
= Impure . inj
injectTerm
incr :: (Incr :<: f) => Int -> Term f ()
= injectTerm (Incr i (Pure ()))
incr i
recall :: (Recall :<: f) => Term f Int
= injectTerm (Recall Pure) recall
The tick
function really does increment a value (by
one). A note in the paper states that it can be given a more general
type, which I use here.
tick :: (Recall :<: f, Incr :<: f) => Term f Int
= do
tick <- recall
y 1
incr return y
While foldExpr
folds an expression using an algebra, the
foldTerm
function folds a Term
using separate
functions for handling Pure
and Impure
. Note
that I name the first argument pure'
because pure
is a Prelude
function.
foldTerm :: forall f a b. Functor f
=> (a -> b) -> (f b -> b) -> Term f a -> b
= go
foldTerm pure' impure where
go :: Term f a -> b
= \case
go Pure x -> pure' x
Impure t -> impure (fmap go t)
The Mem
type represents the memory cell of a
calculator.
newtype Mem = Mem Int
deriving Show
Like Eval
, a Run
type class defines an
algebra for running the operations, with implementations of each
operation in instance definitions.
class Functor f => Run f where
runAlgebra :: f (Mem -> (a, Mem)) -> (Mem -> (a, Mem))
instance Run Incr where
runAlgebra :: Incr (Mem -> (a, Mem)) -> (Mem -> (a, Mem))
Incr k r) (Mem i) = r (Mem (i + k))
runAlgebra (
instance Run Recall where
runAlgebra :: Recall (Mem -> (a, Mem)) -> (Mem -> (a, Mem))
Recall r) (Mem i) = r i (Mem i)
runAlgebra (
instance (Run f, Run g) => Run (f :+: g) where
= \case
runAlgebra Inl r -> runAlgebra r
Inr r -> runAlgebra r
I wrote the type signatures like in the paper, which makes it clear
that the member function is an algebra (type f a -> a
),
but note that the parenthesis around (Mem -> (a, Mem))
at the end of the signature are not necessary. In the instances for
Incr
and Recall
, runAlgebra
has
two parameters. The first is
f (Mem -> (a, Mem))
and the second is Mem
,
giving a return value of (a, Mem)
.
The run
function is perhaps the most difficult to
understand in the paper.
run :: Run f => Term f a -> Mem -> (a, Mem)
= foldTerm (,) runAlgebra run
The pure'
parameter of foldTerm
has type
a -> b
and is passed (,)
which has type
a -> b -> (a, b)
. In the usage of
foldTerm
, type parameter a
remains
a
while type parameter b
is interpreted as
b -> (a, b)
, resulting in
(f (b -> (a, b)) -> b -> (a, b)) -> Term f a -> b -> (a, b)
.
The runAlgebra
function matches the first three
arguments of this, and b
is interpreted as
Mem
, resulting in the type signature of
run
.
It is worthwhile to (mentally) step through the complete evaluation
of run
with the following basic values of Term
(Pure
, recall
, and incr
).
mainRunPure :: IO ()
= print $ run @Recall @Int (Pure 42) (Mem 0)
mainRunPure
mainRunRecall :: IO ()
= print $ run @Recall @Int recall (Mem 42)
mainRunRecall
mainRunIncr :: IO ()
= print $ run @Incr @() (incr 42) (Mem 0) mainRunIncr
Note that the TypeApplications
extension is used to specify the type of f
and
a
.
An example using tick
is given. The paper does not
require use of TypeApplications
because it uses a specific type, while the following uses TypeApplications
because the above definition of tick
uses the more general
type.
mainRunTick :: IO ()
= print $ run @(Recall :+: Incr) tick (Mem 4) mainRunTick
Applications
This section gives a brief demonstration of using free monads to model effects.
Four effectful functions are defined, categorized into two separate data types.
data Teletype a
= GetChar (Char -> a)
| PutChar Char a
instance Functor Teletype where
fmap :: (a -> b) -> Teletype a -> Teletype b
fmap f = \case
GetChar g -> GetChar (f . g)
PutChar c g -> PutChar c (f g)
data FileSystem a
= ReadFile FilePath (String -> a)
| WriteFile FilePath String a
instance Functor FileSystem where
fmap :: (a -> b) -> FileSystem a -> FileSystem b
fmap f = \case
ReadFile path g -> ReadFile path (f . g)
WriteFile path s g -> WriteFile path s (f g)
An exec
function can execute values of these data types
using the Term
free monad.
exec :: Exec f => Term f a -> IO a
= foldTerm return execAlgebra exec
Each value is executed using a function from Prelude,
using an Exec
type class.
class Functor f => Exec f where
execAlgebra :: f (IO a) -> IO a
instance Exec Teletype where
= \case
execAlgebra GetChar f -> Prelude.getChar >>= f
PutChar c io -> Prelude.putChar c >> io
instance Exec FileSystem where
= \case
execAlgebra ReadFile path f -> Prelude.readFile path >>= f
WriteFile path s f -> Prelude.writeFile path s >> f
instance (Exec f, Exec g) => Exec (f :+: g) where
= \case
execAlgebra Inl e -> execAlgebra e
Inr e -> execAlgebra e
The smart constructors can be defined as follows.
getChar :: (Teletype :<: f) => Term f Char
getChar = injectTerm (GetChar Pure)
putChar :: (Teletype :<: f) => Char -> Term f ()
putChar c = injectTerm (PutChar c (Pure ()))
readFile :: (FileSystem :<: f) => FilePath -> Term f String
readFile path = injectTerm (ReadFile path Pure)
writeFile :: (FileSystem :<: f) => FilePath -> String -> Term f ()
writeFile path s = injectTerm (WriteFile path s (Pure ()))
The cat
function serves as an example of composition. In
the following, I use a more general type than that used in the paper. I
also refactored the implementation, using mapM_
instead of mapM
to avoid discarding the resulting list of unit.
cat :: (FileSystem :<: f, Teletype :<: f) => FilePath -> Term f ()
= mapM_ putChar =<< readFile path cat path
The following example uses the cat
function to print the
content of the README.md
file in this directory.
mainCat :: IO ()
= exec @(FileSystem :+: Teletype) $ cat "README.md" mainCat