Skip to main content

Hackage Metadata (Part 2)

I wrote about Hackage Metadata earlier this year, summarizing my understanding of the metadata maintained by the Hackage package repository about Haskell packages. I would like to be able to process this information programmatically, and this blog entry logs my progress toward doing so.

What information do I need?

  • Package versions - I need to know the released versions of each package that is available on Hackage.
  • Package revisions - I need to know the available revisions of each version of each package. Since revisions are identified using sequential numbers, knowing the current revision for each package is sufficient.
  • Preferred and deprecated versions - I need to know the preferred/deprecated version ranges for each package.
  • Package deprecation - I would like to know which packages have been deprecated, though this is not absolutely essential.
  • Candidate packages - I would like to know which candidate packages are available, though it is not essential, and I am not sure if I will even make use of this information yet.

I would really like to have timestamps for all of the above information, though this is not absolutely essential.

Hackage Index Tarball

I investigated the content of a 01-index.tar Hackage index tarball. This index contains the .cabal files of released packages, organized by package name and version, as well as some metadata (hashes, file sizes, etc.) about package files. When a package has one or more revisions, the modified .cabal file for the latest revision is provided, and the revision number is specified a custom field named x-revision.

$ grep revision bm/0.1.0.2/bm.cabal
x-revision: 2

Information about preferred/deprecated versions is included in a preferred-versions file when applicable.

$ cat mtl/preferred-versions ; echo
mtl <2.1 || >2.1 && <2.3 || >2.3

To investigate deprecated packages, I looked at the highlighting-kate package. None of the versions are deprecated, and the whole package is deprecated in favor of the skylighting package. I was unable to find any information in the index regarding this deprecation, unfortunately. The index does not include information about candidate packages either, as expected.

The Hackage index contains the information that I require. It does not contain package deprecation information or candidate package information, but perhaps I could get that information from another source.

hackage-db

The first package that I looked at is hackage-db, a library that makes it very easy to load a Hackage index tarball and query the data using a Map interface. I confirmed that I can query the package revision using the following test code. The full code is available on GitHub.

newtype Revision = Revision Int
  deriving (Eq, Ord, Show)

defaultRevision :: Revision
defaultRevision = Revision 0

lookupRevision
  :: HackageDB
  -> PackageName
  -> Version
  -> Either String Revision
lookupRevision db packageName version = do
    packageData <- maybe (Left "package not found") Right $
      Map.lookup packageName db
    versionData <- maybe (Left "version not found") Right $
      Map.lookup version packageData
    let mRevisionString
          = List.lookup revisionField
          . PD.customFieldsPD
          . GPD.packageDescription
          $ DB.cabalFile versionData
    case mRevisionString of
      Just revisionString -> case readMaybe revisionString of
        Just revision -> pure $ Revision revision
        Nothing -> Left $ "unable to parse revision: " ++ revisionString
      Nothing -> pure defaultRevision
  where
    revisionField :: String
    revisionField = "x-revision"

The package data just has information about specific versions, so this API does not provide information about preferred/deprecated version ranges, unfortunately.

type PackageData = Map Version VersionData

Cabal and cabal-install

The hackage-db package is implemented using the Cabal library, so I took a look at that library next. It does not include information about preferred/deprecated version ranges, but that is expected.

The cabal command takes preferred/deprecated version ranges into account when creating build plans, so I looked at the cabal-install package next. I found the packagePreferences in the SourcePackageDb type. Unfortunately, the version of cabal-install that I am testing with does not expose such functionality in a library. The repository HEAD exposes a library, but a comment indicates that doing so is temporary, so I probably should not rely on it.

Pantry

I looked at the pantry package next. Stack uses Pantry to manage packages. Pantry stores package information in a SQLite database, so I investigated the database to see what information it contains.

The package_name table indexes package names.

sqlite> .schema package_name
CREATE TABLE IF NOT EXISTS "package_name" (
  "id" INTEGER PRIMARY KEY,
  "name" VARCHAR NOT NULL,
  CONSTRAINT "unique_package_name" UNIQUE ("name")
);

The version table indexes version strings, unrelated to packages.

sqlite> .schema version
CREATE TABLE IF NOT EXISTS "version" (
  "id" INTEGER PRIMARY KEY,
  "version" VARCHAR NOT NULL,
  CONSTRAINT "unique_version" UNIQUE ("version")
);

The hackage_cabal table contains package version information, including the revision!

sqlite> .schema hackage_cabal
CREATE TABLE IF NOT EXISTS "hackage_cabal" (
  "id" INTEGER PRIMARY KEY,
  "name" INTEGER NOT NULL
    REFERENCES "package_name"
      ON DELETE RESTRICT
      ON UPDATE RESTRICT,
  "version" INTEGER NOT NULL
    REFERENCES "version"
      ON DELETE RESTRICT
      ON UPDATE RESTRICT,
  "revision" INTEGER NOT NULL,
  "cabal" INTEGER NOT NULL
    REFERENCES "blob"
      ON DELETE RESTRICT
      ON UPDATE RESTRICT,
  "tree" INTEGER NULL
    REFERENCES "tree"
      ON DELETE RESTRICT
      ON UPDATE RESTRICT,
  CONSTRAINT "unique_hackage" UNIQUE ("name","version","revision")
);
SELECT pn.name, v.version, hc.revision
  FROM hackage_cabal AS hc
  JOIN package_name AS pn
    ON hc.name = pn.id
  JOIN version AS v
    ON hc.version = v.id
  WHERE pn.name = 'bm';
name version revision
bm 0.1.0.2 0
bm 0.1.0.2 1
bm 0.1.0.2 2

The preferred_versions table contains the preferred version ranges.

sqlite> .schema preferred_versions
CREATE TABLE IF NOT EXISTS "preferred_versions" (
  "id" INTEGER PRIMARY KEY,
  "name" INTEGER NOT NULL REFERENCES "package_name"
    ON DELETE RESTRICT
    ON UPDATE RESTRICT,
  "preferred" VARCHAR NOT NULL,
  CONSTRAINT "unique_preferred" UNIQUE ("name")
);
SELECT pn.name, pv.preferred
  FROM preferred_versions AS pv
  JOIN package_name AS pn
    ON pv.name = pn.id
  WHERE pn.name = 'mtl';
name preferred
mtl mtl <2.1 || >2.1 && <2.3 || >2.3

Hackage Server API

The Hackage Server API could provide a way to retrieve information that is not included in the Hackage index tarball. Indeed, a candidates endpoint is documented.

$ curl \
    -H 'Accept: application/json' \
    https://hackage.haskell.org/packages/candidates/ \
  > candidates.json
$ du -h candidates.json
1.4M    candidates.json
$ jq length candidates.json
5332
$ jq '.[0]' candidates.json
{
  "candidates": [
    {
      "sha256": "e1766e75168c967a60a1940a89fec96576f2eb75a4f375fe07a7fb7e59db839d",
      "version": "0.1.0.0"
    }
  ],
  "name": "2captcha-haskell"
}

I have noticed that people tend to forget to clean up their candidate packages, but the candidates data is even larger than I expected!

A deprecated endpoint is also documented under the versions feature.

$ curl \
    -H 'Accept: application/json' \
    https://hackage.haskell.org/packages/deprecated \
  > deprecated.json
$ du -h deprecated.json
76K     deprecated.json
$ jq length deprecated.json
1168
$ jq '.[0]' deprecated.json
{
  "deprecated-package": "2captcha",
  "in-favour-of": [
    "captcha-2captcha"
  ]
}
Author

Travis Cardwell

Published

Tags
Related Blog Entries