TTC ShortText Support
Today I released TTC version
1.4.0.0,
which adds support for the ShortText
type from the text-short
package.
ShortText
is a representation of Unicode strings that is
more compact than the Text
type. When using versions prior to text-2.0
,
the difference in size may be very significant because the old Text
type uses the UTF-16
encoding while the ShortText
type uses the UTF-8
encoding, which is more compact for higher code points such as those
used in Asian texts. The
Text
type uses the UTF-8
encoding since text-2.0
,
but ShortText
still saves two words (16 bytes on 64-bit systems) per value because it
does not support zero-copy slicing. Note that the lack of support for
zero-copy slicing itself is a benefit, as it prevents keeping unneeded
data in memory after slicing.
The ShortText
type is implemented as a newtype
wrapper around a ShortByteString
(which TTC has supported since ttc-1.1.0.0
).
Note that the ShortByteString
type differs from the ByteString
type in that it uses unpinned memory; persistent ByteString
values are generally not safe to use in long-running services because
each ByteString
uses pinned memory and can lead to memory fragmentation. This is not an
issue with the Text
type; both Text
and ShortText
use unpinned memory.
Thank you to @Qqwy
for requesting this feature! The symbolize
library looks very useful!