← Blog index
Published

Migrating away from martijnhols/actions-cache

I was an early user of Github Actions, and an . I implemented it in all my projects, both private and public, which alone lead to over 200 developers relying on pipelines I set up in one way, shape or form.

A good Developer Experience is important to me. I want there to be as little friction as possible for all developers. An important part of that (in my opinion), is CI/CD. And in this case, the duration of pipelines. Too often we have to wait for a pipeline to complete before we can complete something. This can be frustrating and distracting, and a weight slowing you down.

This is why I always do everything I can to minimize my pipeline durations.

In comes Github Actions, which compared to alternatives is already leaps ahead. There are a number of reasons for this:

  • Even though it doesn't have the most powerful runners, it's easy enough to parallelize jobs, which in my experience often matters most.
  • It has an almost unlimited amount of runners ready to go at any time
  • It naturally has the best integration with Github. Not having to open or monitor an external system makes things easier (while Github having a monopoly isn't great, from a UX PoV it is)
  • It's free for open source projects
  • Actions are open source and easy to make your own
  • It's popular among open source developers, so there are actions for everything
  • It's config syntax is the of all the CI/CD systems I've used

Using Github Actions has its issues, especially now, years later, where they're constantly breaking older things, but it gets the job done. When you get the hang of it, it's pretty easy to set up jobs, to the point that I can go a bit nuts;

A Github Actions pipeline overview screenshot, showing many build steps being executing in 8m 3s total
I went a bit nuts

This project is quite big, so I'm pretty proud of the pipeline only taking 8 minutes, especially with E2E test execution, and especially since all of this usually takes 3 to 6 times as long in other projects.

Aside

While the E2E test suite didn't cover every flow, it covered every page/form and it did do special things like scanning QR codes and ensuring video call integration (via Jitsi) was working correctly by calling a user that was started from within Cypress. I made many optimizations to the server and webapp to be able to achieve this kind of performance in Cypress.

This was only possible at the time with my fork of Github's cache action, which allowed for big performance improvements.

The cache action

The original cache action only supported one scenario; when the step is executed, it looks if there's a cache hit and restores that, and then it automatically adds a second step to the end of the job to save the file location to cache. This had several downsides:

  • It was impossible to use the cache to share data across jobs, and you needed to use the slow artifact actions for that in addition to your caching (so your job was compressing and uploading the data twice).
  • The cache action always compressed and uploaded the cache, even if it was unmodified.
  • The cache action always pulled data out of the cache, even if you could skip it entirely if you knew there was already an exact match (e.g. skip building if a build for the source already exists).

My fork split the base cache action up in three separate parts:

  • martijnhols/actions-cache/restore: This action reads data from the cache and places it in at the provided path.
  • martijnhols/actions-cache/save: This action saves data at the provided path to the cache.
  • martijnhols/actions-cache/check: This action checks if an exact match is available in the cache without downloading it.

This fixed all of the issues with the original cache action, allowing me to make the quickest possible pipelines (within this ecosystem). It also provides a cleaner way to setup caching, as these actions don't need to automatically add a second step to the end of the job (unlike the original cache action).

Some examples usages were:

  • martijnhols/actions-cache/check before making a build, to skip it if the build already exists
  • martijnhols/actions-cache/save with if: steps.cache.outputs.cache-hit != 'true' to only save if there wasn't already a hit
  • martijnhols/actions-cache/restore to restore dependencies installed in a different job, without saving them again

See recipes for other examples.

This change provided a fix for many issues:

In the end, Github woke up and added this two years after my initial PR was completely ignored.

Breaking changes

Over 4 years since my initial PR, I was still using my own fork because it worked flawlessly and why change a winning team. But now, as Github has introduced breaking changes to their platform (a rewrite of their caching server), I've been forced to update all of my projects yet again.

This is in my opinion the biggest downside to Github Actions; they're constantly breaking things. It's not a big deal for projects actively being maintained, although it does slow it down, but when coming back to projects that haven't been updated in a while it's a real pain in the ass.

Aside

It makes little difference to the breaking change that this was a custom action, as users of Github's cache action also need to update. The biggest difference might be that Github can add deprecation warnings to their own actions, even for users SHA-pinning.

Migration

Migration back to Github's own cache action is fairly straightforward, as Github has implemented a similar structure. I reckon they support all features my fork supported.

  • martijnhols/actions-cache/restore@v3 -> actions/cache/restore@v4
    • required: true -> fail-on-cache-miss: true
    • required: false -> fail-on-cache-miss: false (default)
    • outputs.primary-key -> outputs.cache-primary-key
  • martijnhols/actions-cache/save@v3 -> actions/cache/save@v4
  • martijnhols/actions-cache/check@v3 -> actions/cache/restore@v4 with lookup-only: true
    • outputs.primary-key -> outputs.cache-primary-key

I think that's it. Let me know if I forgot anything.

The good news is, Github's new cache architecture is a lot quicker. Making this change reduced cache save time of an ~82 MB cache artifact from 2m 20s to 3s (↓97.86%)! This is a huge improvement, and I'm happy to see Github making performance improvements.

While I appreciate the performance boost, I'd still prefer fewer breaking changes in GitHub Actions.

More like this

The security risks of front-end dependencies

security
dependencies
maintainability

Everything about Google Translate crashing React (and other web apps)

react
machine-translation
i18n