Hackage Metadata (Part 2)
I wrote about Hackage Metadata earlier this year, summarizing my understanding of the metadata maintained by the Hackage package repository about Haskell packages. I would like to be able to process this information programmatically, and this blog entry logs my progress toward doing so.
What information do I need?
- Package versions - I need to know the released versions of each package that is available on Hackage.
- Package revisions - I need to know the available revisions of each version of each package. Since revisions are identified using sequential numbers, knowing the current revision for each package is sufficient.
- Preferred and deprecated versions - I need to know the preferred/deprecated version ranges for each package.
- Package deprecation - I would like to know which packages have been deprecated, though this is not absolutely essential.
- Candidate packages - I would like to know which candidate packages are available, though it is not essential, and I am not sure if I will even make use of this information yet.
I would really like to have timestamps for all of the above information, though this is not absolutely essential.
Hackage Index Tarball
I investigated the content of a 01-index.tar
Hackage
index tarball. This index contains the .cabal
files of
released packages, organized by package name and version, as well as
some metadata (hashes, file sizes, etc.) about package files. When a
package has one or more revisions, the modified .cabal
file
for the latest revision is provided, and the revision number is
specified a custom field named x-revision
.
$ grep revision bm/0.1.0.2/bm.cabal
x-revision: 2
Information about preferred/deprecated versions is included in a
preferred-versions
file when applicable.
$ cat mtl/preferred-versions ; echo
mtl <2.1 || >2.1 && <2.3 || >2.3
To investigate deprecated packages, I looked at the highlighting-kate
package. None of the versions are deprecated, and the whole package is
deprecated in favor of the skylighting
package. I was unable to find any information in the index regarding
this deprecation, unfortunately. The index does not include information
about candidate packages either, as expected.
The Hackage index contains the information that I require. It does not contain package deprecation information or candidate package information, but perhaps I could get that information from another source.
hackage-db
The first package that I looked at is hackage-db
,
a library that makes it very easy to load a Hackage index tarball and
query the data using a Map
interface. I confirmed that I can query the package revision using the
following test code. The full
code is available on GitHub.
newtype Revision = Revision Int
deriving (Eq, Ord, Show)
defaultRevision :: Revision
= Revision 0
defaultRevision
lookupRevision :: HackageDB
-> PackageName
-> Version
-> Either String Revision
= do
lookupRevision db packageName version <- maybe (Left "package not found") Right $
packageData
Map.lookup packageName db<- maybe (Left "version not found") Right $
versionData
Map.lookup version packageDatalet mRevisionString
= List.lookup revisionField
. PD.customFieldsPD
. GPD.packageDescription
$ DB.cabalFile versionData
case mRevisionString of
Just revisionString -> case readMaybe revisionString of
Just revision -> pure $ Revision revision
Nothing -> Left $ "unable to parse revision: " ++ revisionString
Nothing -> pure defaultRevision
where
revisionField :: String
= "x-revision" revisionField
The package data just has information about specific versions, so this API does not provide information about preferred/deprecated version ranges, unfortunately.
type PackageData = Map Version VersionData
Cabal and
cabal-install
The hackage-db
package is implemented using the Cabal
library, so I took a look at that library next. It does not include
information about preferred/deprecated version ranges, but that is
expected.
The cabal
command takes preferred/deprecated version
ranges into account when creating build plans, so I looked at the cabal-install
package next. I found the packagePreferences
in the SourcePackageDb
type. Unfortunately, the version of
cabal-install
that I am testing with does not expose such
functionality in a library. The repository HEAD
exposes a
library, but a comment indicates that doing so is temporary, so I
probably should not rely on it.
Pantry
I looked at the pantry
package next. Stack uses
Pantry to manage packages. Pantry stores package information in a SQLite database, so I
investigated the database to see what information it contains.
The package_name
table indexes package names.
sqlite> .schema package_name
CREATE TABLE IF NOT EXISTS "package_name" (
"id" INTEGER PRIMARY KEY,
"name" VARCHAR NOT NULL,
CONSTRAINT "unique_package_name" UNIQUE ("name")
);
The version
table indexes version strings, unrelated to
packages.
sqlite> .schema version
CREATE TABLE IF NOT EXISTS "version" (
"id" INTEGER PRIMARY KEY,
"version" VARCHAR NOT NULL,
CONSTRAINT "unique_version" UNIQUE ("version")
);
The hackage_cabal
table contains package version
information, including the revision!
sqlite> .schema hackage_cabal
CREATE TABLE IF NOT EXISTS "hackage_cabal" (
"id" INTEGER PRIMARY KEY,
"name" INTEGER NOT NULL
REFERENCES "package_name"
ON DELETE RESTRICT
ON UPDATE RESTRICT,
"version" INTEGER NOT NULL
REFERENCES "version"
ON DELETE RESTRICT
ON UPDATE RESTRICT,
"revision" INTEGER NOT NULL,
"cabal" INTEGER NOT NULL
REFERENCES "blob"
ON DELETE RESTRICT
ON UPDATE RESTRICT,
"tree" INTEGER NULL
REFERENCES "tree"
ON DELETE RESTRICT
ON UPDATE RESTRICT,
CONSTRAINT "unique_hackage" UNIQUE ("name","version","revision")
);
SELECT pn.name, v.version, hc.revision
FROM hackage_cabal AS hc
JOIN package_name AS pn
ON hc.name = pn.id
JOIN version AS v
ON hc.version = v.id
WHERE pn.name = 'bm';
name |
version |
revision |
---|---|---|
bm |
0.1.0.2 |
0 |
bm |
0.1.0.2 |
1 |
bm |
0.1.0.2 |
2 |
The preferred_versions
table contains the preferred
version ranges.
sqlite> .schema preferred_versions
CREATE TABLE IF NOT EXISTS "preferred_versions" (
"id" INTEGER PRIMARY KEY,
"name" INTEGER NOT NULL REFERENCES "package_name"
ON DELETE RESTRICT
ON UPDATE RESTRICT,
"preferred" VARCHAR NOT NULL,
CONSTRAINT "unique_preferred" UNIQUE ("name")
);
SELECT pn.name, pv.preferred
FROM preferred_versions AS pv
JOIN package_name AS pn
ON pv.name = pn.id
WHERE pn.name = 'mtl';
name |
preferred |
---|---|
mtl |
mtl <2.1 || >2.1 && <2.3 || >2.3 |
Hackage Server API
The Hackage Server API could provide a way to retrieve information that is not included in the Hackage index tarball. Indeed, a candidates endpoint is documented.
$ curl \
-H 'Accept: application/json' \
https://hackage.haskell.org/packages/candidates/ \
> candidates.json
$ du -h candidates.json
1.4M candidates.json
$ jq length candidates.json
5332
$ jq '.[0]' candidates.json
{
"candidates": [
{
"sha256": "e1766e75168c967a60a1940a89fec96576f2eb75a4f375fe07a7fb7e59db839d",
"version": "0.1.0.0"
}
],
"name": "2captcha-haskell"
}
I have noticed that people tend to forget to clean up their candidate packages, but the candidates data is even larger than I expected!
A deprecated
endpoint is also documented under the
versions
feature.
$ curl \
-H 'Accept: application/json' \
https://hackage.haskell.org/packages/deprecated \
> deprecated.json
$ du -h deprecated.json
76K deprecated.json
$ jq length deprecated.json
1168
$ jq '.[0]' deprecated.json
{
"deprecated-package": "2captcha",
"in-favour-of": [
"captcha-2captcha"
]
}