Skip to main content

TTC ShortText Support

Today I released TTC version 1.4.0.0, which adds support for the ShortText type from the text-short package.

ShortText is a representation of Unicode strings that is more compact than the Text type. When using versions prior to text-2.0, the difference in size may be very significant because the old Text type uses the UTF-16 encoding while the ShortText type uses the UTF-8 encoding, which is more compact for higher code points such as those used in Asian texts. The Text type uses the UTF-8 encoding since text-2.0, but ShortText still saves two words (16 bytes on 64-bit systems) per value because it does not support zero-copy slicing. Note that the lack of support for zero-copy slicing itself is a benefit, as it prevents keeping unneeded data in memory after slicing.

The ShortText type is implemented as a newtype wrapper around a ShortByteString (which TTC has supported since ttc-1.1.0.0). Note that the ShortByteString type differs from the ByteString type in that it uses unpinned memory; persistent ByteString values are generally not safe to use in long-running services because each ByteString uses pinned memory and can lead to memory fragmentation. This is not an issue with the Text type; both Text and ShortText use unpinned memory.

Thank you to @Qqwy for requesting this feature! The symbolize library looks very useful!

Author

Travis Cardwell

Published

Tags