I have a reasonably sized Bazel project and I've noticed that bazel fetch //...
takes about 5 minutes in CI (~100 archives for a total of ~3 GB).
I first assumed that it was due to downloading all of the external packages but I ruled that out by mirroring all of the content on the local machine, hosting a local HTTP server, and updating WORKSPACE
to use those URLs which turned out to be barely faster than actually downloading the content.
I also tried putting the git repo and bazel output base on a tmpfs to avoid any disk I/O but that didn't help at all either.
The --experimental_repository_cache
argument seems to have helped a little bit but not significantly.
I watched the output of htop
, dstat
, and the local mirror's server logs during the fetch and found a few peculiar things:
- CPU is idle most of the time. There is about 50% CPU usage (on a 56 core machine) for a few seconds at the very beginning which I'm guessing is doing build graph things. Other than that, there is generally 1 core at 100% usage which I'm guessing is uncompressing the archives.
- The local mirror's server logs indicate that the archives are downloaded in "bursts" (a few dozen at a time) which means that it isn't blocked on download time.
All of this indicates to me that the archives are downloaded in parallel but uncompressing them is done serially which ends up being slow because uncompressing is nontrivial but this is purely speculation.
Coming at this from the other end - the only reason I'm doing bazel fetch //...
in CI is to do an rdeps
query on //...
to determine what to build/test. Rather than having the query implicitly fetch //...
, we do it ahead of time so error messages are clearer, we can retry if it fails, have tighter timeouts on the queries, etc. Is there a better way to determine what targets are affected based on a git diff
so we can avoid the fetch //...
or at least get it done in parallel with building?
Any suggestions for decreasing the time to fetch dependencies would be much appreciated!