Poor performance in bazel fetch

Question

I have a reasonably sized Bazel project and I've noticed that bazel fetch //... takes about 5 minutes in CI (~100 archives for a total of ~3 GB).

I first assumed that it was due to downloading all of the external packages but I ruled that out by mirroring all of the content on the local machine, hosting a local HTTP server, and updating WORKSPACE to use those URLs which turned out to be barely faster than actually downloading the content.

I also tried putting the git repo and bazel output base on a tmpfs to avoid any disk I/O but that didn't help at all either.

The --experimental_repository_cache argument seems to have helped a little bit but not significantly.

I watched the output of htop, dstat, and the local mirror's server logs during the fetch and found a few peculiar things:

CPU is idle most of the time. There is about 50% CPU usage (on a 56 core machine) for a few seconds at the very beginning which I'm guessing is doing build graph things. Other than that, there is generally 1 core at 100% usage which I'm guessing is uncompressing the archives.
The local mirror's server logs indicate that the archives are downloaded in "bursts" (a few dozen at a time) which means that it isn't blocked on download time.

All of this indicates to me that the archives are downloaded in parallel but uncompressing them is done serially which ends up being slow because uncompressing is nontrivial but this is purely speculation.

Coming at this from the other end - the only reason I'm doing bazel fetch //... in CI is to do an rdeps query on //... to determine what to build/test. Rather than having the query implicitly fetch //..., we do it ahead of time so error messages are clearer, we can retry if it fails, have tighter timeouts on the queries, etc. Is there a better way to determine what targets are affected based on a git diff so we can avoid the fetch //... or at least get it done in parallel with building?

Any suggestions for decreasing the time to fetch dependencies would be much appreciated!

This is not an answer, but consider using `--experimental_distdir` or `--repository_cache`. https://docs.bazel.build/versions/master/command-line-reference.html#build-options — Jin, Mar 13 '18 at 04:13
I’m not sure but this sounds like borderline bug. Any chance you can submit a report? — Ittai, Mar 18 '18 at 05:11
What version of Bazel are you using? Make sure you have the fix for https://github.com/bazelbuild/bazel/issues/3397. — Benjamin Peterson, Mar 18 '18 at 06:52
I'm on a slightly modified version of 0.7 which does _not_ have the fix for #3397. I cherry-picked the fix but preliminary results aren't terribly promising - the timings have a fairly high standard deviation but they're on par with the version without the fix. I should certainly upgrade but it's a fairly involved process. — marczych, Mar 19 '18 at 17:38
If you read over https://docs.bazel.build/versions/main/skylark/performance.html you might be able to answer your own question. Otherwise, if you are able to share the output of adding `--profile=/tmp/profile.gz` would be a huge help in answering your question. My guess is its because some repo depends on another i.e. repo_A -> repo_B but bazel can't know that until it has extracted initialised repo_A so it won't start downloading/extracting repo_B until repo_A is done. — silvergasp, Feb 02 '22 at 07:16
For what it's worth, a few years ago we switched to fetching every top level directory which sped things up e.g. `bazel fetch $(git ls-files -- '*/BUILD' | grep -v experimental | sed 's#^./##' | sed 's#/.*##' | sort | uniq | awk '{print "//" $1 "/..."}')`. Then sometime around bazel 3.x that became much slower than fetching `//...` so we switched back and performance has been pretty much fine since. — marczych, Feb 11 '22 at 23:29
Fair enough, another suggestion to speed things up is to use the new bzlmod approach. https://docs.bazel.build/versions/5.0.0/bzlmod.html. It's a little rough around the edges, but for the most part works great and allows for parralell fetching. — silvergasp, Feb 16 '22 at 02:44

Poor performance in bazel fetch

0 Answers0