Textual Type Class
A common complaint about Haskell is that it can be tedious to work with its many textual data types. Most languages have two core data types used to represent text:
- “String” data types represent a sequence of characters. In contemporary languages, Unicode is used to support a wide variety of natural languages.
- “Byte string” data types represent a sequence of bytes. They can be used to hold arbitrary data (colloquially referred to as “binary data”), including text that is encoded using a particular encoding.
At this time, there are three core data types used to represent text in Haskell:
- The
String
data type, defined inData.String
of thebase
package, is a type alias for a list of characters. This simple type can be processed using recursion as well as functions that operate onTraversable
instances,Foldable
instances, or lists, making it a great type to use when learning the language. Representing text as a list of characters results in poor performance, however, and many developers argue that this type should be strictly avoided. - The
Text
data type, defined in thetext
package, is a performant implementation of a (Unicode) string data type. This is the de facto standard for representing text in Haskell. - The
ByteString
data type, defined in thebytestring
package, is a performant implementation of a byte string data type. This is the de facto standard for representing arbitrary data in Haskell.
Note that, of these types, only the String
data type is defined in the Haskell 2010 Language Report. The text
and bytestring
packages are GHC boot packages, however.
The Text
and ByteString
types come in lazy as well as strict variants, and other associated types provide superior performance in specific use cases. “Builder” types are used to gradually “build up” a value. The ShortByteString
type is used when keeping short byte strings in memory, avoiding heap fragmentation issues that arise when using the ByteString
type.
All of these types have good use cases, and they are not difficult to use when you get accustomed to them. Most complaints are made by those who are new to the language and have not yet fully recognized the utility of each type. Other complaints tend to stem from the following issues.
- Since
String
is defined in thebase
package, and functions inbase
use that type, it is quite widely used. For example,String
is the type that is most commonly used to represent error messages, even in libraries that useText
. It is difficult to completely avoid using theString
type. - It is common to need to convert the same text to multiple textual data types. For example, a
String
value may be required to be used within an error message, aText
value may be required by a JSON library, and aByteString
value may be required by a database library. Keeping track of the correct/optimal way to convert various data types to various textual data types can be tedious.
The TTC (Textual Type Classes) library is designed to help with these issues. This article focuses on the Textual
type class.
API
The Textual
type class is used to convert between the following textual data types.
Textual Type |
Abbreviation | Notes |
---|---|---|
String |
S |
|
Data.Text.Text |
T |
|
Data.Text.Lazy.Text |
TL |
|
Data.Text.Lazy.Builder.Builder |
TLB |
|
Data.ByteString.ByteString |
BS |
UTF-8 encoded |
Data.ByteString.Lazy.ByteString |
BSL |
UTF-8 encoded |
Data.ByteString.Builder.Builder |
BSB |
UTF-8 encoded |
Data.ByteString.Short.ShortByteString |
SBS |
UTF-8 encoded |
The Builder
type of the binary package is a re-export of the ByteString
Builder
type, so TTC works with binary
as well.
Note that byte string values are assumed to be UTF-8 encoded. Invalid bytes are replaced with the Unicode replacement character U+FFFD
. In cases where different behavior is required, process the values before using this class.
The library, intended to be imported qualified as TTC
, provides a number of functions for converting between these types, referred to as “Textual
types” below. The most general is the convert
function, which converts from any Textual
type to any Textual
type.
convert :: (Textual t, Textual t') => t' -> t
This function can be used when both the argument and return types are known. For example, if you have a value from a web API of type Text
and you want to insert it into a key-value store using a function that takes a ByteString
, you can use convert
to perform the conversion.
key :: ByteString
value :: Text
:: MonadKVS m => ByteString -> ByteString -> m ()
KVS.insert
$ TTC.convert value KVS.insert key
In some cases, either the argument type or return type must be specified. One can use TypeApplications
for this, but the library also provides “to” and “from” functions for this case, using abbreviations to specify the desired type.
toS :: TTC.Textual t => t -> String
toT :: TTC.Textual t => t -> T.Text
toTL :: TTC.Textual t => t -> TL.Text
toTLB :: TTC.Textual t => t -> TLB.Builder
toBS :: TTC.Textual t => t -> BS.ByteString
toBSL :: TTC.Textual t => t -> BSL.ByteString
toBSB :: TTC.Textual t => t -> BSB.Builder
toSBS :: TTC.Textual t => t -> SBS.ShortByteString
fromS :: TTC.Textual t => String -> t
fromT :: TTC.Textual t => T.Text -> t
fromTL :: TTC.Textual t => TL.Text -> t
fromTLB :: TTC.Textual t => TLB.Builder -> t
fromBS :: TTC.Textual t => BS.ByteString -> t
fromBSL :: TTC.Textual t => BSL.ByteString -> t
fromBSB :: TTC.Textual t => BSB.Builder -> t
fromSBS :: TTC.Textual t => SBS.ShortByteString -> t
Note that these functions can be easier to understand than convert
, so they may be preferred in some cases, even when the argument and return types are known.
When defining functions that take a Textual
type as an argument, “as” functions provide a convenient way to convert the value to the type used within the function.
asS :: TTC.Textual t => (String -> a) -> t -> a
asT :: TTC.Textual t => (T.Text -> a) -> t -> a
asTL :: TTC.Textual t => (TL.Text -> a) -> t -> a
asTLB :: TTC.Textual t => (TLB.Builder -> a) -> t -> a
asBS :: TTC.Textual t => (BS.ByteString -> a) -> t -> a
asBSL :: TTC.Textual t => (BSL.ByteString -> a) -> t -> a
asBSB :: TTC.Textual t => (BSB.Builder -> a) -> t -> a
asSBS :: TTC.Textual t => (SBS.ShortByteString -> a) -> t -> a
For example, the following function reads an Int
from a Textual
type, using the TTC.asS
function to convert the argument to a String
.
import Text.Read (readMaybe)
readInt :: TTC.Textual t => t -> Either String Int
readInt = TTC.asS $ \s ->
maybe (Left $ "not a valid Int: " ++ s) Right (readMaybe s)
Key Features
The Textual
type class has two key features:
- Type conversion is not done through a fixed type (such as
String
orText
). - It has a single type variable, making it easy to write functions that accept arguments and/or return values that may be any of the supported textual data types.
Some packages define a type class that has a single type variable with similar goals, but the implementation performs conversions via a fixed type. This results in unnecessary conversion and hurts performance.
Some packages define a type class that has two type variables for conversion between types in general. Such a type class can be used to implement polymorphic functions, but the implementations perform conversions via a fixed type, resulting in the same problems.
Implementation
In order to implement the Textual
type class with a single type variable, without converting through a fixed type, the supported types are fixed and conversion between each supported type is defined. The Textual
type class is defined as follows.
class Textual t where
toS :: t -> String
toT :: t -> T.Text
toTL :: t -> TL.Text
toTLB :: t -> TLB.Builder
toBS :: t -> BS.ByteString
toBSL :: t -> BSL.ByteString
toBSB :: t -> BSB.Builder
toSBS :: t -> SBS.ShortByteString
convert :: Textual t' => t' -> t
With instances for each Textual
type, the “to” functions define all the necessary conversions. For example, the ByteString
instance is as follows.
instance Textual BS.ByteString where
= T.unpack . TE.decodeUtf8With TEE.lenientDecode
toS = TE.decodeUtf8With TEE.lenientDecode
toT = TLE.decodeUtf8With TEE.lenientDecode . BSL.fromStrict
toTL = TLB.fromText . TE.decodeUtf8With TEE.lenientDecode
toTLB = id
toBS = BSL.fromStrict
toBSL = BSB.byteString
toBSB = SBS.toShort
toSBS = toBS convert
The “to” functions define how to convert from a ByteString
to any Textual
type. The convert
function defines how to convert from any Textual
type to a ByteString
: simply use the toBS
function of the instance of that Textual
type! The “from” functions, defined separately, are simply calls to convert
.
Notice that ByteString
conversion of Unicode text uses the text
package for proper decoding. Incorrect encoding/decoding is a common source of bugs, and using the Textual
type class helps avoid making a mistake.
Since the supported types are fixed, one cannot add support for a different type by writing a new instance. Note, however, that conversion functions for other types can be written by passing through a fixed Textual
type.
Examples
These examples illustrate how the Textual
type class works.
No Conversion
Consider the case where convert
is called but is not necessary. For example, the implementation of a function may use a ByteString
internally but accept any Textual
type. When a ByteString
is passed in, no conversion is required.
@BS.ByteString @BS.ByteString TTC.convert
The return type determines which instance is used. The ByteString
instance is used in this case.
instance Textual BS.ByteString where
...
= toBS convert
The call to convert is translated to a call to toBS
of the argument. Note that this definition is inlined.
@BS.ByteString @BS.ByteString TTC.toBS
The argument is also of of type ByteString
, so that instance is used.
instance Textual BS.ByteString where
...
= id
toBS ...
The call is translated to id
, so no conversion is performed. Note that this definition is also inlined, so the call to convert
is optimized away.
Conversion
Consider a case where convert
is used to perform conversion. For example, a Text
value may be passed to a function that uses ByteString
internally.
@T.Text @BS.ByteString TTC.convert
The return type determines which instance is used. The ByteString
instance is used in this case.
instance Textual BS.ByteString where
...
= toBS convert
The call to convert is translated to a call to toBS
of the argument. Note that this definition is inlined.
@T.Text @BS.ByteString TTC.toBS
The argument is of type Text
, so that instance is used.
instance Textual T.Text where
...
= TE.encodeUtf8
toBS ...
The call is translated to TE.encodeUtf8
, which converts the Text
value to a ByteString
. Note that this definition is also inlined, so the call to convert
performs the conversion with minimal overhead.
Changelog
June 10, 2021
The article is updated to accompany the release of TTC 1.1.0.0
with the following changes:
Textual
instances for builder and short types are added.- The auxiliary functions for builder and short types are removed.
- The Key Features section is rewritten. The article previously included a demonstration of how usage of two type variables leads to conversion via a fixed type, but that demonstration is removed because it confuses readers. (It was a demonstration of problems with other designs, not the design of TTC.)
- The Examples section is added to better explain/emphasize that conversion is not done via a fixed type.