GitHub Actions Caching Update
I am currently using GitHub Actions to run
automated tests for my open source projects. A few weeks ago, I noticed
a comment by Brandon
Chinn about ever-growing cache when one is not careful with
restore-keys
. I do not remember which project the comment
was on, and I am unable to find it, but the comment linked to another
comment that he made on Mark
Karpov’s GitHub
actions for Haskell CI blog post. I updated the caching
configuration for TTC
this morning.
The cache
action has three parameters:
path
specifies one or more paths to cachekey
specifies how to create the cache key, a string that identifies a specific cache entryrestore-keys
specifies one or more key prefixes that determine which previous cache entry to build on when an exact match is not found
When a job’s cache key matches an existing cache key, that cache entry is loaded. This may allow the job to avoid re-downloading and re-building stuff, which can significantly speed up job execution. It is important to note, however, that the cache entry is not updated. If your cache key is too general, you will end up with an old cache that is not very useful. For this reason, it is recommended to include a hash in the cache key so that significant changes in the code create a new/updated cache entry.
Updating cache is done using restore-keys
. When a job’s
cache key does not match any existing cache key,
restore-keys
determines if any existing cache entry should
be loaded anyway. In this case, the updated paths are stored under the
new cache key upon job success. The issue that Brandon commented about
is that general restore-keys
can cause the cache to get
bigger and bigger, since new dependency versions are added without
removing older versions.
The solution that Brandon suggests is to include a timestamp in the
cache key. By including the current month, a cache will only grow for a
month before being reset. I followed this suggestion, just changing the
month format to YYYYmm
instead of using the English
name.
Cache entries are removed when they have been accessed for over seven days, so old cache entries are removed automatically.
I just updated the configuration for TTC this morning, but I plan to update the configuration for my other projects when I can find the time.