Textual Type Class
A common complaint about Haskell is that it can be tedious to work with its many textual data types. Most languages have two core data types used to represent text:
- “String” data types represent a sequence of characters. In contemporary languages, Unicode is used to support a wide variety of natural languages.
- “Byte string” data types represent a sequence of bytes. They can be used to hold arbitrary data (colloquially referred to as “binary data”), including text that is encoded using a particular encoding.
At this time, there are three core data types used to represent text in Haskell:
- The
String
data type, defined inData.String
of thebase
package, is a type alias for a list of characters. This simple type can be processed using recursion as well as functions that operate onTraversable
instances,Foldable
instances, or lists, making it a great type to use when learning the language. Representing text as a list of characters results in poor performance, however, and many developers argue that this type should be strictly avoided. - The
Text
data type, defined in thetext
package, is a performant implementation of a (Unicode) string data type. This is the de facto standard for representing text in Haskell. - The
ByteString
data type, defined in thebytestring
package, is a performant implementation of a byte string data type. This is the de facto standard for representing arbitrary data in Haskell.
Note that, of these types, only the String
data type is
defined in the Haskell 2010
Language Report. The text
and bytestring
packages are GHC boot
packages, however.
The Text
and ByteString
types come in lazy
as well as strict variants, and other associated types provide superior
performance in specific use cases. “Builder” types are used to gradually
“build up” a value. The ShortByteString
type is used when
keeping short byte strings in memory, avoiding heap fragmentation issues
that arise when using the ByteString
type.
All of these types have good use cases, and they are not difficult to use when you get accustomed to them. Most complaints are made by those who are new to the language and have not yet fully recognized the utility of each type. Other complaints tend to stem from the following issues.
- Since
String
is defined in thebase
package, and functions inbase
use that type, it is quite widely used. For example,String
is the type that is most commonly used to represent error messages, even in libraries that useText
. It is difficult to completely avoid using theString
type. - It is common to need to convert the same text to multiple textual
data types. For example, a
String
value may be required to be used within an error message, aText
value may be required by a JSON library, and aByteString
value may be required by a database library. Keeping track of the correct/optimal way to convert various data types to various textual data types can be tedious.
The TTC
(Textual Type Classes) library is designed to help with these
issues. This article focuses on the Textual
type class.
API
The Textual
type class is used to convert between the
following textual data types.
Textual Type |
Abbreviation | Notes |
---|---|---|
String |
S |
|
Data.Text.Text |
T |
|
Data.Text.Lazy.Text |
TL |
|
Data.Text.Lazy.Builder.Builder |
TLB |
|
Data.ByteString.ByteString |
BS |
UTF-8 encoded |
Data.ByteString.Lazy.ByteString |
BSL |
UTF-8 encoded |
Data.ByteString.Builder.Builder |
BSB |
UTF-8 encoded |
Data.ByteString.Short.ShortByteString |
SBS |
UTF-8 encoded |
The Builder
type of the binary package is
a re-export of the ByteString
Builder
type, so
TTC works with binary
as well.
Note that byte string values are assumed to be UTF-8 encoded. Invalid
bytes are replaced with the Unicode replacement character
U+FFFD
. In cases where different behavior is required,
process the values before using this class.
The library, intended to be imported qualified as TTC
,
provides a number of functions for converting between these types,
referred to as “Textual
types” below. The most general is
the convert
function, which converts from any
Textual
type to any Textual
type.
convert :: (Textual t, Textual t') => t' -> t
This function can be used when both the argument and return types are
known. For example, if you have a value from a web API of type
Text
and you want to insert it into a key-value store using
a function that takes a ByteString
, you can use
convert
to perform the conversion.
key :: ByteString
value :: Text
:: MonadKVS m => ByteString -> ByteString -> m ()
KVS.insert
$ TTC.convert value KVS.insert key
In some cases, either the argument type or return type must be
specified. One can use TypeApplications
for this, but the library also provides “to” and “from” functions for
this case, using abbreviations to specify the desired type.
toS :: TTC.Textual t => t -> String
toT :: TTC.Textual t => t -> T.Text
toTL :: TTC.Textual t => t -> TL.Text
toTLB :: TTC.Textual t => t -> TLB.Builder
toBS :: TTC.Textual t => t -> BS.ByteString
toBSL :: TTC.Textual t => t -> BSL.ByteString
toBSB :: TTC.Textual t => t -> BSB.Builder
toSBS :: TTC.Textual t => t -> SBS.ShortByteString
fromS :: TTC.Textual t => String -> t
fromT :: TTC.Textual t => T.Text -> t
fromTL :: TTC.Textual t => TL.Text -> t
fromTLB :: TTC.Textual t => TLB.Builder -> t
fromBS :: TTC.Textual t => BS.ByteString -> t
fromBSL :: TTC.Textual t => BSL.ByteString -> t
fromBSB :: TTC.Textual t => BSB.Builder -> t
fromSBS :: TTC.Textual t => SBS.ShortByteString -> t
Note that these functions can be easier to understand than
convert
, so they may be preferred in some cases, even when
the argument and return types are known.
When defining functions that take a Textual
type as an
argument, “as” functions provide a convenient way to convert the value
to the type used within the function.
asS :: TTC.Textual t => (String -> a) -> t -> a
asT :: TTC.Textual t => (T.Text -> a) -> t -> a
asTL :: TTC.Textual t => (TL.Text -> a) -> t -> a
asTLB :: TTC.Textual t => (TLB.Builder -> a) -> t -> a
asBS :: TTC.Textual t => (BS.ByteString -> a) -> t -> a
asBSL :: TTC.Textual t => (BSL.ByteString -> a) -> t -> a
asBSB :: TTC.Textual t => (BSB.Builder -> a) -> t -> a
asSBS :: TTC.Textual t => (SBS.ShortByteString -> a) -> t -> a
For example, the following function reads an Int
from a
Textual
type, using the TTC.asS
function to
convert the argument to a String
.
import Text.Read (readMaybe)
readInt :: TTC.Textual t => t -> Either String Int
readInt = TTC.asS $ \s ->
maybe (Left $ "not a valid Int: " ++ s) Right (readMaybe s)
Key Features
The Textual
type class has two key features:
- Type conversion is not done through a fixed type (such as
String
orText
). - It has a single type variable, making it easy to write functions that accept arguments and/or return values that may be any of the supported textual data types.
Some packages define a type class that has a single type variable with similar goals, but the implementation performs conversions via a fixed type. This results in unnecessary conversion and hurts performance.
Some packages define a type class that has two type variables for conversion between types in general. Such a type class can be used to implement polymorphic functions, but the implementations perform conversions via a fixed type, resulting in the same problems.
Implementation
In order to implement the Textual
type class with a
single type variable, without converting through a fixed type, the
supported types are fixed and conversion between each supported type is
defined. The Textual
type class is defined as follows.
class Textual t where
toS :: t -> String
toT :: t -> T.Text
toTL :: t -> TL.Text
toTLB :: t -> TLB.Builder
toBS :: t -> BS.ByteString
toBSL :: t -> BSL.ByteString
toBSB :: t -> BSB.Builder
toSBS :: t -> SBS.ShortByteString
convert :: Textual t' => t' -> t
With instances for each Textual
type, the “to” functions
define all the necessary conversions. For example, the
ByteString
instance is as follows.
instance Textual BS.ByteString where
= T.unpack . TE.decodeUtf8With TEE.lenientDecode
toS = TE.decodeUtf8With TEE.lenientDecode
toT = TLE.decodeUtf8With TEE.lenientDecode . BSL.fromStrict
toTL = TLB.fromText . TE.decodeUtf8With TEE.lenientDecode
toTLB = id
toBS = BSL.fromStrict
toBSL = BSB.byteString
toBSB = SBS.toShort
toSBS = toBS convert
The “to” functions define how to convert from a
ByteString
to any Textual
type. The
convert
function defines how to convert from any
Textual
type to a ByteString
: simply use the
toBS
function of the instance of that Textual
type! The “from” functions, defined separately, are simply calls to
convert
.
Notice that ByteString
conversion of Unicode text uses
the text
package for proper decoding. Incorrect
encoding/decoding is a common source of bugs, and using the
Textual
type class helps avoid making a mistake.
Since the supported types are fixed, one cannot add support for a
different type by writing a new instance. Note, however, that conversion
functions for other types can be written by passing through a fixed
Textual
type.
Examples
These examples illustrate how the Textual
type class
works.
No Conversion
Consider the case where convert
is called but is not
necessary. For example, the implementation of a function may use a
ByteString
internally but accept any Textual
type. When a ByteString
is passed in, no conversion is
required.
@BS.ByteString @BS.ByteString TTC.convert
The return type determines which instance is used. The
ByteString
instance is used in this case.
instance Textual BS.ByteString where
...
= toBS convert
The call to convert is translated to a call to toBS
of
the argument. Note that this definition is inlined.
@BS.ByteString @BS.ByteString TTC.toBS
The argument is also of of type ByteString
, so that
instance is used.
instance Textual BS.ByteString where
...
= id
toBS ...
The call is translated to id
, so no conversion is
performed. Note that this definition is also inlined, so the call to
convert
is optimized away.
Conversion
Consider a case where convert
is used to perform
conversion. For example, a Text
value may be passed to a
function that uses ByteString
internally.
@T.Text @BS.ByteString TTC.convert
The return type determines which instance is used. The
ByteString
instance is used in this case.
instance Textual BS.ByteString where
...
= toBS convert
The call to convert is translated to a call to toBS
of
the argument. Note that this definition is inlined.
@T.Text @BS.ByteString TTC.toBS
The argument is of type Text
, so that instance is
used.
instance Textual T.Text where
...
= TE.encodeUtf8
toBS ...
The call is translated to TE.encodeUtf8
, which converts
the Text
value to a ByteString
. Note that this
definition is also inlined, so the call to convert
performs
the conversion with minimal overhead.
Changelog
June 10, 2021
The article is updated to accompany the release of TTC
1.1.0.0
with the following changes:
Textual
instances for builder and short types are added.- The auxiliary functions for builder and short types are removed.
- The Key Features section is rewritten. The article previously included a demonstration of how usage of two type variables leads to conversion via a fixed type, but that demonstration is removed because it confuses readers. (It was a demonstration of problems with other designs, not the design of TTC.)
- The Examples section is added to better explain/emphasize that conversion is not done via a fixed type.