1

Many sources state that in order to speed up the build Bazel computes hashes of the rule inputs and does cache lookup to find pre-built outputs. Unfortunately, I cannot find much details about its logic. For instance, how exactly it calculates the hashes of outputs to look for? Is it some hash of the inputs hashes? Because it needs to be calculated BEFORE the actual outputs are built for the first time. Also, does it ever calculates the hashes of the output files? How those are used? Is there a way to interfere with the way hashes are calculated? For instance, we would like to produce a ZIP archive as the output, but hash only “manifest” file inside that archive as the rest of the archive is produced by non-deterministic tools and the hash would change after every build.

Konstantin Erman
  • 551
  • 6
  • 14

1 Answers1

1

Rules get files in and get them out. Rules are made up of actions. The output of an action should only depend on the explicitly stated inputs. The results of actions are cached. The cache key can probably consist of the env variables, the command line, and the relative paths of the input files.

I think for a deeper investigation here is a start point: https://github.com/bazelbuild/bazel/blob/master/src/main/java/com/google/devtools/build/lib/actions/cache/ActionCache.java

Vertexwahn
  • 7,709
  • 6
  • 64
  • 90
  • "The cache key is computed from the command line, the input files, the output files, and the environment variables for each action. We should always be using relative paths, not absolute paths for the cache key." Source: https://github.com/bazelbuild/bazel/issues/2998#issuecomment-301160663 – karlbsm Nov 23 '21 at 07:13