Readonly cache for Yarn in Docker

Question

I want to create a global Yarn cache for my projects that are built on my CI servers. My projects are dockerized, and I have created a nightly job to populate this cache. Think of it as a project with a package.json containing all dependencies from all projects. When this job is done, the cache is populated and I want this cache to be used by my daily jobs.

However, sometimes there are updates to these dependencies which are not cached before. Hence, Yarn tries to write to the cache directory. However, because of this issue and the corruption of cache in the case of multiple writers, I don't want to let my daily yarn installs write to this pre-populated cache.

I currently have this in my Dockerfiles (using BuildKit):

RUN --mount=type=cache,target=/usr/local/share/.cache/yarn/v6,ro yarn install

If I remove that ro (readonly), I may face with a corrupted cache directory. If I keep it, my yarn install may fail with something like this error when it needs to update its cache:

verbose 1.426 Error: EROFS: read-only file system, mkdir '/usr/local/share/.cache/yarn/v6/npm-bluebird-3.7.2-9f229c15be272454ffa973ace0dbee79a1b0c36f'

If I set the --cache-folder to somewhere other than the populated cache, no cache is consumed.

Is there a way I can cache the packages this way? Docker layer caching is useless when the package.json file is updated and causes the builds to take several minutes just to update a single muli-kilobytes dependency.

score 0 · Answer 1 · answered Mar 23 '21 at 14:29

0

Like the issue tells, yarn install has a --mutex option to prevent concurrent builds. But since this is a BuildKit cache, you can instead use sharing=locked to only have one yarn access the cache at a time:

RUN --mount=type=cache,sharing=locked,target=/usr/local/share/.cache/yarn yarn install

If you need separate caches, you can specify the id= instead of changing target. target is used as cache id by default, but by specifying the id yourself you can have the same cache at different targets or same target with different caches.

answered Mar 23 '21 at 14:29

Marko Kohtala

744
6
16

I may have hundreds of concurrent jobs running on my CI system. I actually don't see using a mutex lock alone as a good solution, since it will slow down my builds instead of actually make them faster. However, I can separate the build cache of each service (using the mentioned `id`) to have a more isolated cache, and use `--mutex` to ensure builds of each service uses the cache only once at a time. I'll try this and come back if it resulted in better performance. – Ali Tou Mar 23 '21 at 23:48
When dependencies tend to differ, if not in package.json in yarn.lock, there is not all that much overlap between caches and separate caches work well. And since it is a cache, it can get pruned. Old cruft prune better in smaller caches, and prune of a cache causes less of an impact to the CI system when each cache affects less builds. I've not used `--mutex`, but I expect it to hold the lock for shorter time than `sharing=locked`. – Marko Kohtala Mar 25 '21 at 10:00

Readonly cache for Yarn in Docker

1 Answers1