# Data types à la carte

I am reviewing (making sure I understand) some classic Haskell papers. The first up is Wouter Swierstra’s Data types à la carte (CiteSeer), first published in 2008. Inspired by Sandy Maguire, I am writing this blog entry as part of my review.

This blog entry is written in Literate Haskell, using LiterateX to render the HTML. The source code is available on GitHub. Note that the code is in the order used in the paper.

## Setup

This implementation uses the following GHC extensions.

```
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE InstanceSigs #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeApplications #-}
{-# LANGUAGE TypeOperators #-}
```

It is implemented as a library so that it is easy to experiment with using GHCi. As a convention, functions with names *starting with* `main`

have type `IO ()`

, providing an easy way to run various tests from within GHCi.

`module DTALC where`

The following imports are used.

```
-- https://hackage.haskell.org/package/base
import Control.Applicative ((<|>))
import qualified Prelude
import Prelude hiding (getChar, putChar, readFile, writeFile)
```

## Fixing the expression problem

This section shows how to represent expressions without using a sum type that specifies all valid constructors. Being able to separate/modularize the parts of an expression resolves the famous “expression problem.”

The following defines a recursive data type to represent expressions. Type parameter `f`

specifies the type constructor(s) that can be used in an expression. It takes a type parameter that specifies the type of the expression (`Expr f`

), used in the recursive parts.

`newtype Expr f = In (f (Expr f))`

Perhaps it is worthwhile to compare this definition to the `Expr`

definition in the paper introduction. In the introduction, `Expr`

is a sum type that specifies all valid (value) constructors. The `Add`

constructor has recursive arguments, allowing the addition of expressions (not just values). In the above definition, type `f`

can be used to specify all valid types, using *separate* types instead of a single sum type. On the right-hand-side, `f (Expr f)`

specifies that a value of `Expr f`

is a value of one of the types specified by “signature” `f`

, with recursive parts of type `Expr f`

. A `newtype`

is required for the type system, using `In`

to construct values of `Expr`

.

The `Val`

and `Add`

types are given as initial examples. The `IntExpr`

and `AddExpr`

shows how these can be used with the above `Expr`

type, but note that they are not otherwise used.

```
newtype Val e = Val Int
type IntExpr = Expr Val
data Add e = Add e e
type AddExpr = Expr Add
```

The following defines a type-level operator that represents the coproduct of two signatures. It requires the TypeOperators extension. It is used like a list constructed from cons pairs, with left cells indicating type constructors and right cells pointing to the next pair, except that the chain ends with the final right cell pointing to a type constructor instead of null. The operator is marked as right-associative so that these chains are constructed correctly. There are better ways to implement this, but they are not discussed in this paper.

```
data (f :+: g) e = Inl (f e) | Inr (g e)
infixr 8 :+:
```

This type operator is infix, but it has an additional type parameter `e`

. Note that `(f :+: g) e`

could alternatively be written `(:+:) f g e`

.

The following `addExample`

definition illustrates how bad it would be to create expressions using just the above definitions.

```
addExample :: Expr (Val :+: Add)
= In (Inr (Add (In (Inl (Val 118))) (In (Inl (Val 1219))))) addExample
```

## Evaluation

This section shows how to evaluate expressions.

The `Functor`

definitions are straightforward. Note that I use the InstanceSigs extension to explicitly write the specialized signatures for all instance methods in this implementation. Though it is likely unhelpful in this case, such explicit signatures can help understand more complex definitions. I also avoid deriving any instances so that the definitions are explicit.

```
instance Functor Val where
fmap :: (a -> b) -> Val a -> Val b
fmap _f (Val x) = Val x
instance Functor Add where
fmap :: (a -> b) -> Add a -> Add b
fmap f (Add x y) = Add (f x) (f y)
instance (Functor f, Functor g) => Functor (f :+: g) where
fmap :: (a -> b) -> (f :+: g) a -> (f :+: g) b
fmap f = \case
Inl e -> Inl (fmap f e)
Inr e -> Inr (fmap f e)
```

Note that there are a few (unimportant) stylistic changes in the above code. I use `_f`

in the `Val`

instance to indicate that the function is not used in the definition. I also use the LambdaCase extension.

The `foldExpr`

function folds an expression using the provided “algebra.” It uses the `Functor`

instances to traverse the recursive parts. Perhaps due to my Scheme background, I prefer to use a helper function for the recursion, with the first argument in scope. Since the type signature for the helper function references type variables in the type signature of `foldExpr`

, the ScopedTypeVariables extension is required.

```
foldExpr :: forall f a. Functor f => (f a -> a) -> Expr f -> a
= go
foldExpr f where
go :: Expr f -> a
In t) = f (fmap go t) go (
```

An algebra is defined for evaluating expressions. The use of a type class allows evaluation of each part of an expression to be defined separately.

```
class Functor f => Eval f where
evalAlgebra :: f Int -> Int
```

The instances are straighforward. The type signatures clearly show that the type parameters are `Int`

when performing evaluation (*not* `Expr f`

). This illustrates the versatility of the abstract definitions.

```
instance Eval Val where
evalAlgebra :: Val Int -> Int
Val x) = x
evalAlgebra (
instance Eval Add where
evalAlgebra :: Add Int -> Int
Add x y) = x + y
evalAlgebra (
instance (Eval f, Eval g) => Eval (f :+: g) where
evalAlgebra :: (f :+: g) Int -> Int
= \case
evalAlgebra Inl x -> evalAlgebra x
Inr y -> evalAlgebra y
```

The `eval`

function implements evaluation using `foldExpr`

and `evalAlgebra`

.

```
eval :: Eval f => Expr f -> Int
= foldExpr evalAlgebra eval
```

The example evaluates `addExample`

.

```
mainEvalAddExample :: IO ()
= print $ eval addExample mainEvalAddExample
```

## Automating injections

This section shows how to avoid having to manually write out all of the constructors like was done in `addExample`

. A type class is used to define a constraint that asserts that a specific type is a member of a signature. The `inj`

method represents injection, while the `prj`

method discussed at the end of the next section represents the partial inverse.

```
class (Functor sub, Functor sup) => sub :<: sup where
inj :: sub a -> sup a
prj :: sup a -> Maybe (sub a)
```

Since this class has two parameters, the MultiParamTypeClasses extension is required.

The instances are very straightforward. Note that the FlexibleInstances extension is required because `f`

is not a distinct type variable. The paper notes the overlapping instances, and the final instance needs to be annotated with `OVERLAPPABLE`

to compile.

```
instance Functor f => f :<: f where
inj :: f a -> f a
= id
inj
prj :: f a -> Maybe (f a)
= Just
prj
instance (Functor f, Functor g) => f :<: (f :+: g) where
inj :: f a -> (f :+: g) a
= Inl
inj
prj :: (f :+: g) a -> Maybe (f a)
= \case
prj Inl e -> Just e
Inr _e -> Nothing
instance {-# OVERLAPPABLE #-}
Functor f, Functor g, Functor h, f :<: g) => f :<: (h :+: g) where
( inj :: f a -> (h :+: g) a
= Inr . inj
inj
prj :: (h :+: g) a -> Maybe (f a)
= \case
prj Inl _e -> Nothing
Inr e -> prj e
```

The smart constructors are implemented using an `inject`

helper function. The FlexibleContexts extension is required because non-type-variable arguments are used in the constraint. Note that I implement the smart constructor for `Add`

as `%+%`

instead of using Unicode.

```
inject :: (g :<: f) => g (Expr f) -> Expr f
= In . inj
inject
val :: (Val :<: f) => Int -> Expr f
= inject (Val x)
val x
(%+%) :: (Add :<: f) => Expr f -> Expr f -> Expr f
%+% y = inject (Add x y)
x infixl 6 %+%
```

These smart constructors are much more convenient than the constructors used in `addExample`

!

```
addExample2 :: Expr (Val :+: Add)
= val 30000 %+% val 1330 %+% val 7
addExample2
mainEvalAddExample2 :: IO ()
= print $ eval addExample2 mainEvalAddExample2
```

## Examples

This section shows how new expression terms and functions can be added without having to change the implementation of existing terms and functions.

Multiplication is implemented as `Mul`

. Note that I implemented the smart constructor for `Mul`

as `%*%`

instead of using Unicode.

```
data Mul e = Mul e e
instance Functor Mul where
fmap :: (a -> b) -> Mul a -> Mul b
fmap f (Mul x y) = Mul (f x) (f y)
instance Eval Mul where
evalAlgebra :: Mul Int -> Int
Mul x y) = x * y
evalAlgebra (
(%*%) :: (Mul :<: f) => Expr f -> Expr f -> Expr f
%*% y = inject (Mul x y)
x infixl 7 %*%
```

The examples demonstrate how this new term can be used with the existing terms.

```
mulExample :: Expr (Val :+: Add :+: Mul)
= val 80 %*% val 5 %+% val 4
mulExample
mainEvalMulExample :: IO ()
= print $ eval mulExample
mainEvalMulExample
mulExample2 :: Expr (Val :+: Mul)
= val 6 %*% val 7
mulExample2
mainEvalMulExample2 :: IO ()
= print $ eval mulExample2 mainEvalMulExample2
```

Pretty printing is implemented using the `Render`

type class.

```
class Render f where
render :: Render g => f (Expr g) -> String
pretty :: Render f => Expr f -> String
In t) = render t pretty (
```

Like evaluation, instances are defined for each term as well as `(:+:)`

.

```
instance Render Val where
render :: Render g => Val (Expr g) -> String
Val i) = show i
render (
instance Render Add where
render :: Render g => Add (Expr g) -> String
Add x y) = "(" ++ pretty x ++ " + " ++ pretty y ++ ")"
render (
instance Render Mul where
render :: Render g => Mul (Expr g) -> String
Mul x y) = "(" ++ pretty x ++ " * " ++ pretty y ++ ")"
render (
instance (Render f, Render g) => Render (f :+: g) where
render :: Render h => (f :+: g) (Expr h) -> String
= \case
render Inl x -> render x
Inr y -> render y
```

I implement a `Show`

instance for convenience.

```
instance Render f => Show (Expr f) where
show :: Expr f -> String
show = pretty
```

The example pretty-prints `mulExample`

.

```
mainPrettyMulExample :: IO ()
= print mulExample mainPrettyMulExample
```

I think that the discussion about `prj`

is very important. When using a sum type, one can pattern-match against all of the constructors of that type, but this is not possible using `Expr`

. This discussion shows how to use `prj`

to match constructors, specifying the appropriate type constraints. The `match`

function simply unwraps the `Expr`

constructor and calls `prj`

on the wrapped value.

```
match :: (g :<: f) => Expr f -> Maybe (g (Expr f))
In t) = prj t match (
```

The `distr`

function applies the distributive law on the outermost constructors of an expression. It is used for term rewriting. As shown, however, it only works when the right multiplicand is an addition. It does not work when the left multiplicand is an addition. The following version supports both.

```
distr :: forall f. (Add :<: f, Mul :<: f) => Expr f -> Maybe (Expr f)
= distrL <|> distrR
distr t where
distrL :: Maybe (Expr f)
= do
distrL Mul a b <- match t
Add c d <- match b
return (a %*% c %+% a %*% d)
distrR :: Maybe (Expr f)
= do
distrR Mul a b <- match t
Add c d <- match a
return (c %*% b %+% d %*% b)
```

The paper states that an algebra can be defined to fold over an expression to uniformly apply the distributive law using `distr`

, but no code is provided. I tried implementing this myself, but I do not yet know how to write an algebra (type `f a -> a`

) to do this. A type class should not be required, as no new functionality needs to be implemented per term. Also, it cannot be done in a single pass because one rewrite may enable further rewrites in sub-expressions.

I *tried* implementing this using the following function, which uses mutually-recursive helper functions. The `go`

function traverses (sub-)expressions, possibly rewriting them from the buttom up using the `f'`

function. The `f'`

function attempts to rewrite an expression using the passed rewrite function. If the expression is rewritten, `go`

is called on the result so that any newly enabled rewrites in sub-expressions can be performed.

```
rewriteExpr :: forall f. Functor f
=> (Expr f -> Maybe (Expr f)) -> Expr f -> Expr f
= go
rewriteExpr f where
go :: Expr f -> Expr f
In t) = f' (In (fmap go t))
go (
f' :: Expr f -> Expr f
= maybe e go (f e) f' e
```

This function works fine when using GHC 8.10.7, but compilation fails when using GHC 9.0.2 with a “simplifier ticks exhausted.” The error persists when I replace `f'`

with `id`

, so the problem is with the recursion in the `go`

helper function. It is very similar to the `foldExpr`

definition, except that it returns expressions instead of folded values. Simplifying, even definition `rewriteExpr f (In t) = In (fmap (rewriteExpr f) t)`

causes the error.

Using GHC 8.10.7, the `rewriteDistr`

function is a helper function that uses `rewriteExpr`

to rewrite an expression using just the `distr`

function.

```
rewriteDistr :: (Add :<: f, Mul :<: f) => Expr f -> Expr f
= rewriteExpr distr rewriteDistr
```

The following are examples of incresting complexity.

```
distrExample1 :: Expr (Val :+: Add :+: Mul)
= val 2 %*% (val 3 %+% val 4)
distrExample1
mainDistrExample1 :: IO ()
= do
mainDistrExample1 print distrExample1
print $ rewriteDistr distrExample1
```

```
distrExample2 :: Expr (Val :+: Add :+: Mul)
= (val 2 %+% val 3) %*% val 4
distrExample2
mainDistrExample2 :: IO ()
= do
mainDistrExample2 print distrExample2
print $ rewriteDistr distrExample2
```

```
distrExample3 :: Expr (Val :+: Add :+: Mul)
= val 2 %*% ((val 3 %+% val 4) %*% val 5)
distrExample3
mainDistrExample3 :: IO ()
= do
mainDistrExample3 print distrExample3
print $ rewriteDistr distrExample3
```

```
distrExample4 :: Expr (Val :+: Add :+: Mul)
= (val 2 %+% val 3) %*% (val 4 %*% (val 5 %+% val 6))
distrExample4
mainDistrExample4 :: IO ()
= do
mainDistrExample4 print distrExample4
print $ rewriteDistr distrExample4
```

```
distrExample5 :: Expr (Val :+: Add :+: Mul)
=
distrExample5 2 %*% (val 3 %+% val 4)) %*%
(val 5 %+% val 6) %*% (val 7 %*% (val 8 %+% val 9)))
((val
mainDistrExample5 :: IO ()
= do
mainDistrExample5 print distrExample5
print $ rewriteDistr distrExample5
```

All of the rewritten expressions are indeed in disjunctive normal form, but I am very interested in finding a better solution. If you know of a different solution or how to fix `rewriteExpr`

to work with GHC 9, please let me know!

## Monads for free

This section shows how a free monad is implemented in Haskell. It is used to implement a calculator memory cell.

A free monad is implemented as type `Term`

. Note that an `Applicative`

instance is now required for defining a `Monad`

instance.

```
data Term f a
= Pure a
| Impure (f (Term f a))
instance Functor f => Functor (Term f) where
fmap :: (a -> b) -> Term f a -> Term f b
fmap f = \case
Pure x -> Pure (f x)
Impure t -> Impure (fmap (fmap f) t)
instance Functor f => Applicative (Term f) where
pure :: a -> Term f a
pure x = Pure x
(<*>) :: Term f (a -> b) -> Term f a -> Term f b
Pure f <*> t = fmap f t
Impure f <*> t = Impure (fmap (<*> t) f)
instance Functor f => Monad (Term f) where
return :: a -> Term f a
return x = Pure x
(>>=) :: Term f a -> (a -> Term f b) -> Term f b
Pure x >>= f = f x
Impure t >>= f = Impure (fmap (>>= f) t)
```

The following examples are not used in the implementation.

```
data Zero a
data One a = One
newtype Const e a = Const e
```

The `Incr`

and `Recall`

types represent the “increment” and “recall” operations. Note that “increment” is not used to mean the addition of one; it represents the addition of an arbitrary value. These types have straightforward `Functor`

instances.

```
data Incr t = Incr Int t
instance Functor Incr where
fmap :: (a -> b) -> Incr a -> Incr b
fmap f (Incr i g) = Incr i (f g)
newtype Recall t = Recall (Int -> t)
instance Functor Recall where
fmap :: (a -> b) -> Recall a -> Recall b
fmap f (Recall g) = Recall (f . g)
```

The smart constructors are implemented using an `inject`

helper function, which I name `injectTerm`

because there is already an `inject`

function in this module.

```
injectTerm :: (g :<: f) => g (Term f a) -> Term f a
= Impure . inj
injectTerm
incr :: (Incr :<: f) => Int -> Term f ()
= injectTerm (Incr i (Pure ()))
incr i
recall :: (Recall :<: f) => Term f Int
= injectTerm (Recall Pure) recall
```

The `tick`

function really does increment a value (by one). A note in the paper states that it can be given a more general type, which I use here.

```
tick :: (Recall :<: f, Incr :<: f) => Term f Int
= do
tick <- recall
y 1
incr return y
```

While `foldExpr`

folds an expression using an algebra, the `foldTerm`

function folds a `Term`

using separate functions for handling `Pure`

and `Impure`

. Note that I name the first argument `pure'`

because `pure`

is a Prelude function.

```
foldTerm :: forall f a b. Functor f
=> (a -> b) -> (f b -> b) -> Term f a -> b
= go
foldTerm pure' impure where
go :: Term f a -> b
= \case
go Pure x -> pure' x
Impure t -> impure (fmap go t)
```

The `Mem`

type represents the memory cell of a calculator.

```
newtype Mem = Mem Int
deriving Show
```

Like `Eval`

, a `Run`

type class defines an algebra for running the operations, with implementations of each operation in instance definitions.

```
class Functor f => Run f where
runAlgebra :: f (Mem -> (a, Mem)) -> (Mem -> (a, Mem))
instance Run Incr where
runAlgebra :: Incr (Mem -> (a, Mem)) -> (Mem -> (a, Mem))
Incr k r) (Mem i) = r (Mem (i + k))
runAlgebra (
instance Run Recall where
runAlgebra :: Recall (Mem -> (a, Mem)) -> (Mem -> (a, Mem))
Recall r) (Mem i) = r i (Mem i)
runAlgebra (
instance (Run f, Run g) => Run (f :+: g) where
= \case
runAlgebra Inl r -> runAlgebra r
Inr r -> runAlgebra r
```

I wrote the type signatures like in the paper, which makes it clear that the member function is an algebra (type `f a -> a`

), but note that the parenthesis around `(Mem -> (a, Mem))`

at the end of the signature are not necessary. In the instances for `Incr`

and `Recall`

, `runAlgebra`

has *two* parameters. The first is `f (Mem -> (a, Mem))`

and the second is `Mem`

, giving a return value of `(a, Mem)`

.

The `run`

function is perhaps the most difficult to understand in the paper.

```
run :: Run f => Term f a -> Mem -> (a, Mem)
= foldTerm (,) runAlgebra run
```

The `pure'`

parameter of `foldTerm`

has type `a -> b`

and is passed `(,)`

which has type `a -> b -> (a, b)`

. In the usage of `foldTerm`

, type parameter `a`

remains `a`

while type parameter `b`

is interpreted as `b -> (a, b)`

, resulting in `(f (b -> (a, b)) -> b -> (a, b)) -> Term f a -> b -> (a, b)`

. The `runAlgebra`

function matches the first *three* arguments of this, and `b`

is interpreted as `Mem`

, resulting in the type signature of `run`

.

It is worthwhile to (mentally) step through the complete evaluation of `run`

with the following basic values of `Term`

(`Pure`

, `recall`

, and `incr`

).

```
mainRunPure :: IO ()
= print $ run @Recall @Int (Pure 42) (Mem 0)
mainRunPure
mainRunRecall :: IO ()
= print $ run @Recall @Int recall (Mem 42)
mainRunRecall
mainRunIncr :: IO ()
= print $ run @Incr @() (incr 42) (Mem 0) mainRunIncr
```

Note that the TypeApplications extension is used to specify the type of `f`

and `a`

.

An example using `tick`

is given. The paper does not require use of TypeApplications because it uses a specific type, while the following uses TypeApplications because the above definition of `tick`

uses the more general type.

```
mainRunTick :: IO ()
= print $ run @(Recall :+: Incr) tick (Mem 4) mainRunTick
```

## Applications

This section gives a brief demonstration of using free monads to model effects.

Four effectful functions are defined, categorized into two separate data types.

```
data Teletype a
= GetChar (Char -> a)
| PutChar Char a
instance Functor Teletype where
fmap :: (a -> b) -> Teletype a -> Teletype b
fmap f = \case
GetChar g -> GetChar (f . g)
PutChar c g -> PutChar c (f g)
data FileSystem a
= ReadFile FilePath (String -> a)
| WriteFile FilePath String a
instance Functor FileSystem where
fmap :: (a -> b) -> FileSystem a -> FileSystem b
fmap f = \case
ReadFile path g -> ReadFile path (f . g)
WriteFile path s g -> WriteFile path s (f g)
```

An `exec`

function can execute values of these data types using the `Term`

free monad.

```
exec :: Exec f => Term f a -> IO a
= foldTerm return execAlgebra exec
```

Each value is executed using a function from Prelude, using an `Exec`

type class.

```
class Functor f => Exec f where
execAlgebra :: f (IO a) -> IO a
instance Exec Teletype where
= \case
execAlgebra GetChar f -> Prelude.getChar >>= f
PutChar c io -> Prelude.putChar c >> io
instance Exec FileSystem where
= \case
execAlgebra ReadFile path f -> Prelude.readFile path >>= f
WriteFile path s f -> Prelude.writeFile path s >> f
instance (Exec f, Exec g) => Exec (f :+: g) where
= \case
execAlgebra Inl e -> execAlgebra e
Inr e -> execAlgebra e
```

The smart constructors can be defined as follows.

```
getChar :: (Teletype :<: f) => Term f Char
getChar = injectTerm (GetChar Pure)
putChar :: (Teletype :<: f) => Char -> Term f ()
putChar c = injectTerm (PutChar c (Pure ()))
readFile :: (FileSystem :<: f) => FilePath -> Term f String
readFile path = injectTerm (ReadFile path Pure)
writeFile :: (FileSystem :<: f) => FilePath -> String -> Term f ()
writeFile path s = injectTerm (WriteFile path s (Pure ()))
```

The `cat`

function serves as an example of composition. In the following, I use a more general type than that used in the paper. I also refactored the implementation, using `mapM_`

instead of `mapM`

to avoid discarding the resulting list of unit.

```
cat :: (FileSystem :<: f, Teletype :<: f) => FilePath -> Term f ()
= mapM_ putChar =<< readFile path cat path
```

The following example uses the `cat`

function to print the content of the `README.md`

file in this directory.

```
mainCat :: IO ()
= exec @(FileSystem :+: Teletype) $ cat "README.md" mainCat
```