JSON/YAML Object Key Order Using Aeson
JSON and YAML are often used for configuration
because these formats provide a standard way to specify structured data
that is both widely recognized and widely supported. Objects are used to
specify an unordered set of name-value
pairs, but there are a number of cases where a fixed order in the
text file greatly improves the usability of
configuration because it makes it easier for users to read and
understand it. This blog entry is about doing this when using the
popular aeson
library.
Both JSON and YAML are far from perfect, and there are many people who dislike using these formats for configuration. I too have worked with huge YAML configuration files where it is very difficult to keep track of the nested context. Regardless, ordering object name-value pairs can make such configuration at least a bit easier to read.
The aeson
library uses the Value
sum type to represent JSON values, and it represents Object
s
using a KeyMap
data type, which is either a Map
(from the containers
package) or a HashMap
(from the unordered-containers
package), depending on the value of the ordered-keymap
flag
when the package is built. When the Map
data type is used
(when ordered-keymap
is true), object name-value pairs are
output in lexicographic
order of the names, which generally looks like alphabetic order to
users. When the HashMap
data type is used (when
ordered-keymap
is false), object name-value pairs are
output in the order of the hashes of the
names, which generally looks like an arbitrary order to users. When
name-value pairs are represented using any of these types, any
user-defined order is lost and cannot be recovered.
The encode
function encodes data as JSON, and the encodeFile
function writes the JSON to a file. These functions create “minimal”
JSON, with no unnecessary whitespace. It is trivial to output ordered
object name-value pairs when using these functions, using the toEncoding
method of the ToJSON
type class. This method allows developers to directly encode data as
JSON, without going through an Object
type. It is used for increased performance, but it also allows the
developer to specify the order of object name-value pairs: simply
provide toEncoding
implementations in your instances that output name-value pairs in the
correct order.
The aeson-pretty
library provides a way to encode data as JSON that uses whitespace
(indentation and newlines) to make the JSON easier to read by humans.
This library works by first converting the data to the Value
representation and then traversing the Value
,
encoding it according to the configuration. Objects are represented
using Object
,
so name-value pairs are ordered according to the KeyMap
used, as described above. The Config
provides a confCompare
function that can be used to sort
the name-value pairs, however, and the keyOrder
function can be used to specify the order.
Using confCompare
is trivial when you just need to order
the name-value pairs of a single object. It gets more challenging when
you need to order the name-value pairs of a very large/nested data
structure, particularly when different objects use some of the same
names. The confCompare
function has to be able to sort
all of the objects in the large/nested data structure, so the
list of names passed to keyOrder
must be compatible with all of them. Note that it is not possible to
order some name a
before b
with one data type
and b
before a
with another data type; the
order of the names must be consistent across the whole data
structure.
A general solution is to define the order for each object separately
and combine them into a directed graph.
When the resulting graph is acyclic,
any topological
ordering is a valid order to use with keyOrder
.
Note that this calculation can be done at build time using template-haskell
.
The keyOrder
function is implemented by indexing the list using elemIndex
,
which is O(n)
. Since this is used for comparison in a sort
algorithm, I wonder if it might be worthwhile to use a HashMap
instead. I have not benchmarked this yet, though.
The yaml
library provides the same sort of configured comparison in the Data.Yaml.Pretty
module. The aeson-pretty
keyOrder
function (and the same ordered list of names) may be used with this
library.
I wrote a bit about this topic in Aeson Object Design (and Part 2). The techniques described in this blog entry can be used to order object name-value pairs when encoding JSON/YAML, as long as the order of names across the whole data structure is consistent. I still do not know of a way to decode JSON/YAML and maintain the order, however. This is sometimes required when processing arbitrary JSON/YAML or working with poorly designed JSON/YAML structures that “use values as keys.”
I am currently working on a project where I need to order object name-value pairs in output JSON/YAML. When the project is released, I will write a follow-up to this blog entry with discussion about how the code is organized to implement this, including links to the code.