Textual Type Class

Series: TTC (Textual Type Classes)

A common complaint about Haskell is that it can be tedious to work with its many textual data types. Most languages have two core data types used to represent text:

“String” data types represent a sequence of characters. In contemporary languages, Unicode is used to support a wide variety of natural languages.
“Byte string” data types represent a sequence of bytes. They can be used to hold arbitrary data (colloquially referred to as “binary data”), including text that is encoded using a particular encoding.

At this time, there are three core data types used to represent text in Haskell:

The String data type, defined in Data.String of the base package, is a type alias for a list of characters. This simple type can be processed using recursion as well as functions that operate on Traversable instances, Foldable instances, or lists, making it a great type to use when learning the language. Representing text as a list of characters results in poor performance, however, and many developers argue that this type should be strictly avoided.
The Text data type, defined in the text package, is a performant implementation of a (Unicode) string data type. This is the de facto standard for representing text in Haskell.
The ByteString data type, defined in the bytestring package, is a performant implementation of a byte string data type. This is the de facto standard for representing arbitrary data in Haskell.

Note that, of these types, only the String data type is defined in the Haskell 2010 Language Report. The text and bytestring packages are GHC boot packages, however.

The Text and ByteString types come in lazy as well as strict variants, and other associated types provide superior performance in specific use cases. “Builder” types are used to gradually “build up” a value. The ShortByteString type is used when keeping short byte strings in memory, avoiding heap fragmentation issues that arise when using the ByteString type.

All of these types have good use cases, and they are not difficult to use when you get accustomed to them. Most complaints are made by those who are new to the language and have not yet fully recognized the utility of each type. Other complaints tend to stem from the following issues.

Since String is defined in the base package, and functions in base use that type, it is quite widely used. For example, String is the type that is most commonly used to represent error messages, even in libraries that use Text. It is difficult to completely avoid using the String type.
It is common to need to convert the same text to multiple textual data types. For example, a String value may be required to be used within an error message, a Text value may be required by a JSON library, and a ByteString value may be required by a database library. Keeping track of the correct/optimal way to convert various data types to various textual data types can be tedious.

The TTC (Textual Type Classes) library is designed to help with these issues. This article focuses on the Textual type class.

API

The Textual type class is used to convert between the following textual data types.

`Textual` Type	Abbreviation	Notes
`String`	`S`
`Data.Text.Text`	`T`
`Data.Text.Lazy.Text`	`TL`
`Data.Text.Lazy.Builder.Builder`	`TLB`
`Data.ByteString.ByteString`	`BS`	UTF-8 encoded
`Data.ByteString.Lazy.ByteString`	`BSL`	UTF-8 encoded
`Data.ByteString.Builder.Builder`	`BSB`	UTF-8 encoded
`Data.ByteString.Short.ShortByteString`	`SBS`	UTF-8 encoded

The Builder type of the binary package is a re-export of the ByteString Builder type, so TTC works with binary as well.

Note that byte string values are assumed to be UTF-8 encoded. Invalid bytes are replaced with the Unicode replacement character U+FFFD. In cases where different behavior is required, process the values before using this class.

The library, intended to be imported qualified as TTC, provides a number of functions for converting between these types, referred to as “Textual types” below. The most general is the convert function, which converts from any Textual type to any Textual type.

convert :: (Textual t, Textual t') => t' -> t

This function can be used when both the argument and return types are known. For example, if you have a value from a web API of type Text and you want to insert it into a key-value store using a function that takes a ByteString, you can use convert to perform the conversion.

key :: ByteString
value :: Text
KVS.insert :: MonadKVS m => ByteString -> ByteString -> m ()

KVS.insert key $ TTC.convert value

In some cases, either the argument type or return type must be specified. One can use TypeApplications for this, but the library also provides “to” and “from” functions for this case, using abbreviations to specify the desired type.

toS   :: TTC.Textual t => t -> String
toT   :: TTC.Textual t => t -> T.Text
toTL  :: TTC.Textual t => t -> TL.Text
toTLB :: TTC.Textual t => t -> TLB.Builder
toBS  :: TTC.Textual t => t -> BS.ByteString
toBSL :: TTC.Textual t => t -> BSL.ByteString
toBSB :: TTC.Textual t => t -> BSB.Builder
toSBS :: TTC.Textual t => t -> SBS.ShortByteString

fromS   :: TTC.Textual t => String              -> t
fromT   :: TTC.Textual t => T.Text              -> t
fromTL  :: TTC.Textual t => TL.Text             -> t
fromTLB :: TTC.Textual t => TLB.Builder         -> t
fromBS  :: TTC.Textual t => BS.ByteString       -> t
fromBSL :: TTC.Textual t => BSL.ByteString      -> t
fromBSB :: TTC.Textual t => BSB.Builder         -> t
fromSBS :: TTC.Textual t => SBS.ShortByteString -> t

Note that these functions can be easier to understand than convert, so they may be preferred in some cases, even when the argument and return types are known.

When defining functions that take a Textual type as an argument, “as” functions provide a convenient way to convert the value to the type used within the function.

asS   :: TTC.Textual t => (String              -> a) -> t -> a
asT   :: TTC.Textual t => (T.Text              -> a) -> t -> a
asTL  :: TTC.Textual t => (TL.Text             -> a) -> t -> a
asTLB :: TTC.Textual t => (TLB.Builder         -> a) -> t -> a
asBS  :: TTC.Textual t => (BS.ByteString       -> a) -> t -> a
asBSL :: TTC.Textual t => (BSL.ByteString      -> a) -> t -> a
asBSB :: TTC.Textual t => (BSB.Builder         -> a) -> t -> a
asSBS :: TTC.Textual t => (SBS.ShortByteString -> a) -> t -> a

For example, the following function reads an Int from a Textual type, using the TTC.asS function to convert the argument to a String.

import Text.Read (readMaybe)

readInt :: TTC.Textual t => t -> Either String Int
readInt = TTC.asS $ \s ->
    maybe (Left $ "not a valid Int: " ++ s) Right (readMaybe s)

Key Features

The Textual type class has two key features:

Type conversion is not done through a fixed type (such as String or Text).
It has a single type variable, making it easy to write functions that accept arguments and/or return values that may be any of the supported textual data types.

Some packages define a type class that has a single type variable with similar goals, but the implementation performs conversions via a fixed type. This results in unnecessary conversion and hurts performance.

Some packages define a type class that has two type variables for conversion between types in general. Such a type class can be used to implement polymorphic functions, but the implementations perform conversions via a fixed type, resulting in the same problems.

Implementation

In order to implement the Textual type class with a single type variable, without converting through a fixed type, the supported types are fixed and conversion between each supported type is defined. The Textual type class is defined as follows.

class Textual t where
  toS     :: t  -> String
  toT     :: t  -> T.Text
  toTL    :: t  -> TL.Text
  toTLB   :: t  -> TLB.Builder
  toBS    :: t  -> BS.ByteString
  toBSL   :: t  -> BSL.ByteString
  toBSB   :: t  -> BSB.Builder
  toSBS   :: t  -> SBS.ShortByteString
  convert :: Textual t' => t' -> t

With instances for each Textual type, the “to” functions define all the necessary conversions. For example, the ByteString instance is as follows.

instance Textual BS.ByteString where
  toS     = T.unpack . TE.decodeUtf8With TEE.lenientDecode
  toT     = TE.decodeUtf8With TEE.lenientDecode
  toTL    = TLE.decodeUtf8With TEE.lenientDecode . BSL.fromStrict
  toTLB   = TLB.fromText . TE.decodeUtf8With TEE.lenientDecode
  toBS    = id
  toBSL   = BSL.fromStrict
  toBSB   = BSB.byteString
  toSBS   = SBS.toShort
  convert = toBS

The “to” functions define how to convert from a ByteString to any Textual type. The convert function defines how to convert from any Textual type to a ByteString: simply use the toBS function of the instance of that Textual type! The “from” functions, defined separately, are simply calls to convert.

Notice that ByteString conversion of Unicode text uses the text package for proper decoding. Incorrect encoding/decoding is a common source of bugs, and using the Textual type class helps avoid making a mistake.

Since the supported types are fixed, one cannot add support for a different type by writing a new instance. Note, however, that conversion functions for other types can be written by passing through a fixed Textual type.

Examples

These examples illustrate how the Textual type class works.

No Conversion

Consider the case where convert is called but is not necessary. For example, the implementation of a function may use a ByteString internally but accept any Textual type. When a ByteString is passed in, no conversion is required.

TTC.convert @BS.ByteString @BS.ByteString

The return type determines which instance is used. The ByteString instance is used in this case.

instance Textual BS.ByteString where
  ...
  convert = toBS

The call to convert is translated to a call to toBS of the argument. Note that this definition is inlined.

TTC.toBS @BS.ByteString @BS.ByteString

The argument is also of of type ByteString, so that instance is used.

instance Textual BS.ByteString where
  ...
  toBS = id
  ...

The call is translated to id, so no conversion is performed. Note that this definition is also inlined, so the call to convert is optimized away.

Conversion

Consider a case where convert is used to perform conversion. For example, a Text value may be passed to a function that uses ByteString internally.

TTC.convert @T.Text @BS.ByteString

The return type determines which instance is used. The ByteString instance is used in this case.

instance Textual BS.ByteString where
  ...
  convert = toBS

The call to convert is translated to a call to toBS of the argument. Note that this definition is inlined.

TTC.toBS @T.Text @BS.ByteString

The argument is of type Text, so that instance is used.

instance Textual T.Text where
  ...
  toBS = TE.encodeUtf8
  ...

The call is translated to TE.encodeUtf8, which converts the Text value to a ByteString. Note that this definition is also inlined, so the call to convert performs the conversion with minimal overhead.

Changelog

June 10, 2021

The article is updated to accompany the release of TTC 1.1.0.0 with the following changes:

Textual instances for builder and short types are added.
The auxiliary functions for builder and short types are removed.
The Key Features section is rewritten. The article previously included a demonstration of how usage of two type variables leads to conversion via a fixed type, but that demonstration is removed because it confuses readers. (It was a demonstration of problems with other designs, not the design of TTC.)
The Examples section is added to better explain/emphasize that conversion is not done via a fixed type.

Author

Travis Cardwell

Published

June 2, 2021

Revised

July 10, 2021

Series

TTC (Textual Type Classes)