Haskell Monorepo GitHub Actions (Part 1)
I am using GitHub Actions to run automated tests for my released projects. All of the personal projects that I have released so far have a single package per Git repository. I will soon need to use GitHub Actions to run tests for multiple Haskell packages in a single repository (aka “monorepo”), however. This blog entry discusses my initial thoughts about this.
My currently released personal projects all run tests when there is a
push to the develop
or main
branches, as well
as on pull requests. The Haskell projects run tests with the following
configurations.
- Tests are run using Cabal, for all supported GHC versions
- Tests are run using Stack, for all supported GHC versions
- Tests are run for all supported Cabal versions
- Tests are run against the lower bounds of dependencies
- Tests are run against the upper bounds of dependencies
Since one of my goals for these projects is to support a wide range of dependency versions, each project has quite a few tests. For example, TTC currently supports 11 major GHC versions and 8 major Cabal versions, resulting in a total of 32 test jobs per push.
One additional note is that my GitHub Actions
configuration for Haskell projects is mostly the same for all projects.
I try to factor out project-specific information that can be loaded by a
config
job in order to decrease the maintenance cost. There
is still room for improvement. This is one of the things that I will
think about while developing the new configuration.
As I wrote in my recent status report, I am developing
a new design of TTC
that I plan to release in separate packages. The core library supports
the same GHC versions as TTC,
while companion libraries provide support for new types that have
narrower compatibility. I currently plan to manage all of these packages
within the ttc-haskell
repository, making it easy to coordinate. In addition, I have some more
projects with more than one package per repository that I hope to
release someday (soon).
I need to determine how to configure GitHub Actions to run automated tests for all of the packages in the repository. The naïve approach would be to run completely separate tests for each package, but this would result in excessive use of resources, in terms of monetary costs (to GitHub), carbon footprint, and time. I would like to avoid this.
I was unable to find any blog entries about using GitHub Actions with a Haskell monorepo. Searching for examples, I checked out various Haskell projects with more than one package in a single repository. Most of them are not helpful. For example, a number of them use Nix to test only the exact versions of dependencies configured in the flake, which is the opposite of my goals.
The following projects are the most helpful as inspiration for me. All of these projects test all packages in each test job, so there is no excessive use of resources.
haskell-servant/servant
has a very straightforward configuration with a single job matrix that runs tests using Cabal on Linux for various GHC versions. In each job, all projects are built and tested.yesodweb/wai
has a very straightforward configuration with a single job matrix that runs tests using Stack on various OSes fornightly
and various LTS snapshots. In each job, all projects are built and tested, and documentation is built.haskell/haskell-language-server
has a more involved configuration that runs tests using Cabal on various OSes for various GHC versions. In each job, all projects are built and tested when appropriate, using conditionals such asif: matrix.test && matrix.ghc != '9.10'
.simonmichael/hledger
as a more involved configuration that runs tests using Stack on Linux with the single version of GHC that is on the Ubuntu container. Perhaps this is done so that the tests run quickly. In each job, all projects are built and tested.
Researching the topic in general, I learned that GitHub Actions allows you
to filter by path (documentation).
This makes it possible to run separate CI jobs depending on which
packages have been changed. For example, perhaps configuration like the
following would be appropriate in a
.github/workflows/ci-ttc.yml
configuration that runs tests
for the ttc
package.
on:
push:
branches:
- develop
- main
paths:
- 'ttc/**'
- '.github/workflows/ci-ttc.yml'
This functionality helps avoid running tests for unchanged packages. Changes across many packages would trigger separate tests for all of those packages, however. For example, adding support for a new GHC minor release is often done across all packages. One would either need to use many separate caches, perhaps exceeding the storage limit, or configure sequencing in order to share caches without conflict. Perhaps this is most useful when testing monorepos that contain largely unrelated packages that are essentially separate projects.
I found a feature request to allow
workflow configuration in sub-folders that has a lot of interesting
discussion. I learned that GitHub Actions does not
support symbolic links, so many people manage separate workflows in
subdirectories that they copy to the
.github/workflows
directory in the project root to enable
them. A project called Hawk provides a CLI to
make such management easy.
I will now take some time to “digest” this information. I am currently leaning toward minimizing the number of jobs and making steps conditional when necessary. I would like to avoid project-specific configuration. I know how to do so by putting conditionals into scripts, but I would like to continue to have meaningful (fine-grained) tasks in the GitHub Actions UI.
A related change that I am considering is removing the tests of the
supported Cabal versions,
aside from the oldest. Other tests use the latest
version,
and perhaps just testing the oldest supported is sufficient.