Textual Type Class
A common complaint about Haskell is that it can be tedious to work with its many textual data types. Most languages have two core data types used to represent text:
- “String” data types represent a sequence of characters. In contemporary languages, Unicode is used to support a wide variety of natural languages.
- “Byte string” data types represent a sequence of bytes. They can be used to hold arbitrary data (colloquially referred to as “binary data”), including text that is encoded using a particular encoding.
At this time, there are three core data types used to represent text in Haskell:
- The
Stringdata type, defined inData.Stringof thebasepackage, is a type alias for a list of characters. This simple type can be processed using recursion as well as functions that operate onTraversableinstances,Foldableinstances, or lists, making it a great type to use when learning the language. Representing text as a list of characters results in poor performance, however, and many developers argue that this type should be strictly avoided. - The
Textdata type, defined in thetextpackage, is a performant implementation of a (Unicode) string data type. This is the de facto standard for representing text in Haskell. - The
ByteStringdata type, defined in thebytestringpackage, is a performant implementation of a byte string data type. This is the de facto standard for representing arbitrary data in Haskell.
Note that, of these types, only the String data type is
defined in the Haskell 2010
Language Report. The text and bytestring
packages are GHC boot
packages, however.
The Text and ByteString types come in lazy
as well as strict variants, and other associated types provide superior
performance in specific use cases. “Builder” types are used to gradually
“build up” a value. The ShortByteString type is used when
keeping short byte strings in memory, avoiding heap fragmentation issues
that arise when using the ByteString type.
All of these types have good use cases, and they are not difficult to use when you get accustomed to them. Most complaints are made by those who are new to the language and have not yet fully recognized the utility of each type. Other complaints tend to stem from the following issues.
- Since
Stringis defined in thebasepackage, and functions inbaseuse that type, it is quite widely used. For example,Stringis the type that is most commonly used to represent error messages, even in libraries that useText. It is difficult to completely avoid using theStringtype. - It is common to need to convert the same text to multiple textual
data types. For example, a
Stringvalue may be required to be used within an error message, aTextvalue may be required by a JSON library, and aByteStringvalue may be required by a database library. Keeping track of the correct/optimal way to convert various data types to various textual data types can be tedious.
The TTC
(Textual Type Classes) library is designed to help with these
issues. This article focuses on the Textual type class.
API
The Textual type class is used to convert between the
following textual data types.
Textual Type |
Abbreviation | Notes |
|---|---|---|
String |
S |
|
Data.Text.Text |
T |
|
Data.Text.Lazy.Text |
TL |
|
Data.Text.Lazy.Builder.Builder |
TLB |
|
Data.ByteString.ByteString |
BS |
UTF-8 encoded |
Data.ByteString.Lazy.ByteString |
BSL |
UTF-8 encoded |
Data.ByteString.Builder.Builder |
BSB |
UTF-8 encoded |
Data.ByteString.Short.ShortByteString |
SBS |
UTF-8 encoded |
The Builder type of the binary package is
a re-export of the ByteString Builder type, so
TTC works with binary as well.
Note that byte string values are assumed to be UTF-8 encoded. Invalid
bytes are replaced with the Unicode replacement character
U+FFFD. In cases where different behavior is required,
process the values before using this class.
The library, intended to be imported qualified as TTC,
provides a number of functions for converting between these types,
referred to as “Textual types” below. The most general is
the convert function, which converts from any
Textual type to any Textual type.
convert :: (Textual t, Textual t') => t' -> tThis function can be used when both the argument and return types are
known. For example, if you have a value from a web API of type
Text and you want to insert it into a key-value store using
a function that takes a ByteString, you can use
convert to perform the conversion.
key :: ByteString
value :: Text
KVS.insert :: MonadKVS m => ByteString -> ByteString -> m ()
KVS.insert key $ TTC.convert valueIn some cases, either the argument type or return type must be
specified. One can use TypeApplications
for this, but the library also provides “to” and “from” functions for
this case, using abbreviations to specify the desired type.
toS :: TTC.Textual t => t -> String
toT :: TTC.Textual t => t -> T.Text
toTL :: TTC.Textual t => t -> TL.Text
toTLB :: TTC.Textual t => t -> TLB.Builder
toBS :: TTC.Textual t => t -> BS.ByteString
toBSL :: TTC.Textual t => t -> BSL.ByteString
toBSB :: TTC.Textual t => t -> BSB.Builder
toSBS :: TTC.Textual t => t -> SBS.ShortByteString
fromS :: TTC.Textual t => String -> t
fromT :: TTC.Textual t => T.Text -> t
fromTL :: TTC.Textual t => TL.Text -> t
fromTLB :: TTC.Textual t => TLB.Builder -> t
fromBS :: TTC.Textual t => BS.ByteString -> t
fromBSL :: TTC.Textual t => BSL.ByteString -> t
fromBSB :: TTC.Textual t => BSB.Builder -> t
fromSBS :: TTC.Textual t => SBS.ShortByteString -> tNote that these functions can be easier to understand than
convert, so they may be preferred in some cases, even when
the argument and return types are known.
When defining functions that take a Textual type as an
argument, “as” functions provide a convenient way to convert the value
to the type used within the function.
asS :: TTC.Textual t => (String -> a) -> t -> a
asT :: TTC.Textual t => (T.Text -> a) -> t -> a
asTL :: TTC.Textual t => (TL.Text -> a) -> t -> a
asTLB :: TTC.Textual t => (TLB.Builder -> a) -> t -> a
asBS :: TTC.Textual t => (BS.ByteString -> a) -> t -> a
asBSL :: TTC.Textual t => (BSL.ByteString -> a) -> t -> a
asBSB :: TTC.Textual t => (BSB.Builder -> a) -> t -> a
asSBS :: TTC.Textual t => (SBS.ShortByteString -> a) -> t -> aFor example, the following function reads an Int from a
Textual type, using the TTC.asS function to
convert the argument to a String.
import Text.Read (readMaybe)
readInt :: TTC.Textual t => t -> Either String Int
readInt = TTC.asS $ \s ->
maybe (Left $ "not a valid Int: " ++ s) Right (readMaybe s)Key Features
The Textual type class has two key features:
- Type conversion is not done through a fixed type (such as
StringorText). - It has a single type variable, making it easy to write functions that accept arguments and/or return values that may be any of the supported textual data types.
Some packages define a type class that has a single type variable with similar goals, but the implementation performs conversions via a fixed type. This results in unnecessary conversion and hurts performance.
Some packages define a type class that has two type variables for conversion between types in general. Such a type class can be used to implement polymorphic functions, but the implementations perform conversions via a fixed type, resulting in the same problems.
Implementation
In order to implement the Textual type class with a
single type variable, without converting through a fixed type, the
supported types are fixed and conversion between each supported type is
defined. The Textual type class is defined as follows.
class Textual t where
toS :: t -> String
toT :: t -> T.Text
toTL :: t -> TL.Text
toTLB :: t -> TLB.Builder
toBS :: t -> BS.ByteString
toBSL :: t -> BSL.ByteString
toBSB :: t -> BSB.Builder
toSBS :: t -> SBS.ShortByteString
convert :: Textual t' => t' -> tWith instances for each Textual type, the “to” functions
define all the necessary conversions. For example, the
ByteString instance is as follows.
instance Textual BS.ByteString where
toS = T.unpack . TE.decodeUtf8With TEE.lenientDecode
toT = TE.decodeUtf8With TEE.lenientDecode
toTL = TLE.decodeUtf8With TEE.lenientDecode . BSL.fromStrict
toTLB = TLB.fromText . TE.decodeUtf8With TEE.lenientDecode
toBS = id
toBSL = BSL.fromStrict
toBSB = BSB.byteString
toSBS = SBS.toShort
convert = toBSThe “to” functions define how to convert from a
ByteString to any Textual type. The
convert function defines how to convert from any
Textual type to a ByteString: simply use the
toBS function of the instance of that Textual
type! The “from” functions, defined separately, are simply calls to
convert.
Notice that ByteString conversion of Unicode text uses
the text package for proper decoding. Incorrect
encoding/decoding is a common source of bugs, and using the
Textual type class helps avoid making a mistake.
Since the supported types are fixed, one cannot add support for a
different type by writing a new instance. Note, however, that conversion
functions for other types can be written by passing through a fixed
Textual type.
Examples
These examples illustrate how the Textual type class
works.
No Conversion
Consider the case where convert is called but is not
necessary. For example, the implementation of a function may use a
ByteString internally but accept any Textual
type. When a ByteString is passed in, no conversion is
required.
TTC.convert @BS.ByteString @BS.ByteStringThe return type determines which instance is used. The
ByteString instance is used in this case.
instance Textual BS.ByteString where
...
convert = toBSThe call to convert is translated to a call to toBS of
the argument. Note that this definition is inlined.
TTC.toBS @BS.ByteString @BS.ByteStringThe argument is also of of type ByteString, so that
instance is used.
instance Textual BS.ByteString where
...
toBS = id
...The call is translated to id, so no conversion is
performed. Note that this definition is also inlined, so the call to
convert is optimized away.
Conversion
Consider a case where convert is used to perform
conversion. For example, a Text value may be passed to a
function that uses ByteString internally.
TTC.convert @T.Text @BS.ByteStringThe return type determines which instance is used. The
ByteString instance is used in this case.
instance Textual BS.ByteString where
...
convert = toBSThe call to convert is translated to a call to toBS of
the argument. Note that this definition is inlined.
TTC.toBS @T.Text @BS.ByteStringThe argument is of type Text, so that instance is
used.
instance Textual T.Text where
...
toBS = TE.encodeUtf8
...The call is translated to TE.encodeUtf8, which converts
the Text value to a ByteString. Note that this
definition is also inlined, so the call to convert performs
the conversion with minimal overhead.
Changelog
June 10, 2021
The article is updated to accompany the release of TTC
1.1.0.0 with the following changes:
Textualinstances for builder and short types are added.- The auxiliary functions for builder and short types are removed.
- The Key Features section is rewritten. The article previously included a demonstration of how usage of two type variables leads to conversion via a fixed type, but that demonstration is removed because it confuses readers. (It was a demonstration of problems with other designs, not the design of TTC.)
- The Examples section is added to better explain/emphasize that conversion is not done via a fixed type.