16

In Bazel, given a build target, how would a script (which is running outside of Bazel) get the path to the generated file?

Scenario: I'm using Bazel to do the build, and then when it's done, I want to copy the result to a server. I just need to know what files to copy. I could hard-code the list of files, but I would prefer not to do that.

A simple example: This Bazel script:

genrule(
    name = "main",
    srcs = ["main.in"],
    outs = ["main.out"],
    cmd = "cp $< $@",
)

If you then make a file named main.in and then run bazel build :main, bazel reports:

INFO: Found 1 target...
Target //:main up-to-date:
  bazel-genfiles/main.out
INFO: Elapsed time: 6.427s, Critical Path: 0.40s

So there is is: bazel-genfiles/main.out. But what machine-readable technique can I use to get that path? (I could parse the output of bazel build, but we are discouraged from doing that.)

The closest I have found is to use bazel query --output=xml :main, which dumps information about :main in XML format. The output includes this line:

<rule-output name="//:main.out"/>

That is so close to what I want. But the name is in Bazel's label format; I don't see how to get it as a path.

I could do some kind of string replacement on that name field, to turn it into bazel-genfiles/main.out; but even that isn't reliable. If my genrule had included output_to_bindir = 1, then the output would have been bazel-bin/main.out instead.

Furthermore, not all rules have a <rule-output> field in the XML output. For example, if my BUILD file has this code to make a C library:

cc_library(
    name = "mylib",
    srcs = glob(["*.c"])
)

The output of bazel query --output=xml :mylib does not contain a <rule-output> or anything else helpful:

<?xml version="1.1" encoding="UTF-8" standalone="no"?>
<query version="2">
  <rule class="cc_library" location="/Users/mikemorearty/src/bazel/test1/BUILD:8:1" name="//:mylib">
    <string name="name" value="mylib"/>
    <list name="srcs">
      <label value="//:foo.c"/>
    </list>
    <rule-input name="//:foo.c"/>
    <rule-input name="//tools/defaults:crosstool"/>
    <rule-input name="@bazel_tools//tools/cpp:stl"/>
  </rule>
</query>
Mike Morearty
  • 9,953
  • 5
  • 31
  • 35
  • Thanks for this question. I have a similar use-case. In my case, bazel builds some software that is a CLI for some application. I'd like to package and deploy the CLI and not expect my users to "bazel run" it every time. – svohara Feb 28 '20 at 19:26

4 Answers4

10

You can get this information by using bazel aquery to query the action graph.

Here’s a slightly richer example, with two output files from a single genrule:

$ ls
BUILD  main.in  WORKSPACE
$ cat WORKSPACE
$ cat BUILD
genrule(
    name = "main",
    srcs = ["main.in"],
    outs = ["main.o1", "main.o2"],
    cmd = "cp $< $(location main.o1); cp $< $(location main.o2)",
)
$ cat main.in
hello

Use bazel aquery //:main --output=textproto to query the action graph with machine-readable output (the proto is analysis.ActionGraphContainer):

$ bazel aquery //:main --output=textproto >aquery_result 2>/dev/null
$ cat aquery_result
artifacts {
  id: "0"
  exec_path: "main.in"
}
artifacts {
  id: "1"
  exec_path: "external/bazel_tools/tools/genrule/genrule-setup.sh"
}
artifacts {
  id: "2"
  exec_path: "bazel-out/k8-fastbuild/genfiles/main.o1"
}
artifacts {
  id: "3"
  exec_path: "bazel-out/k8-fastbuild/genfiles/main.o2"
}
actions {
  target_id: "0"
  action_key: "dd7fd759bbecce118a399c6ce7b0c4aa"
  mnemonic: "Genrule"
  configuration_id: "0"
  arguments: "/bin/bash"
  arguments: "-c"
  arguments: "source external/bazel_tools/tools/genrule/genrule-setup.sh; cp main.in bazel-out/k8-fastbuild/genfiles/main.o1; cp main.in bazel-out/k8-fastbuild/genfiles/main.o2"
  input_dep_set_ids: "0"
  output_ids: "2"
  output_ids: "3"
}
targets {
  id: "0"
  label: "//:main"
  rule_class_id: "0"
}
dep_set_of_files {
  id: "0"
  direct_artifact_ids: "0"
  direct_artifact_ids: "1"
}
configuration {
  id: "0"
  mnemonic: "k8-fastbuild"
  platform_name: "k8"
}
rule_classes {
  id: "0"
  name: "genrule"
}

The data isn’t exactly all in one place, but note that:

  • the artifacts with IDs 2 and 3 correspond to our two desired output files, and list the output locations of those artifacts as paths to files on disk relative to your workspace root;
  • the artifacts entry with target ID 0 is associated with artifact IDs 2 and 3; and
  • the targets entry with ID "0" is associated with the //:main label.

Given this simple structure, we can easily whip together a script to list all output files corresponding to a provided label. I can’t find a way to depend directly on Bazel’s definition of analysis.proto or its language bindings from an external repository, so you can patch the following script into the bazelbuild/bazel repository itself:

tools/list_outputs/list_outputs.py

# Copyright 2019 The Bazel Authors. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
r"""Parse an `aquery` result to list outputs created for a target.

Use this binary in conjunction with `bazel aquery` to determine the
paths on disk to output files of a target.

Example usage: first, query the action graph for the target that you
want to analyze:

    bazel aquery //path/to:target --output=textproto >/tmp/aquery_result

Then, from the Bazel repository:

    bazel run //tools/list_outputs -- \
        --aquery_result /tmp/aquery_result \
        --label //path/to:target \
        ;

This will print a list of zero or more output files emitted by the given
target, like:

    bazel-out/k8-fastbuild/foo.genfile
    bazel-out/k8-fastbuild/bar.genfile

If the provided label does not appear in the output graph, an error will
be raised.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import sys

from absl import app
from absl import flags
from google.protobuf import text_format
from src.main.protobuf import analysis_pb2


flags.DEFINE_string(
    "aquery_result",
    None,
    "Path to file containing result of `bazel aquery ... --output=textproto`.",
)
flags.DEFINE_string(
    "label",
    None,
    "Label whose outputs to print.",
)


def die(message):
  sys.stderr.write("fatal: %s\n" % (message,))
  sys.exit(1)


def main(unused_argv):
  if flags.FLAGS.aquery_result is None:
    raise app.UsageError("Missing `--aquery_result` argument.")
  if flags.FLAGS.label is None:
    raise app.UsageError("Missing `--label` argument.")

  if flags.FLAGS.aquery_result == "-":
    aquery_result = sys.stdin.read()
  else:
    with open(flags.FLAGS.aquery_result) as infile:
      aquery_result = infile.read()
  label = flags.FLAGS.label

  action_graph_container = analysis_pb2.ActionGraphContainer()
  text_format.Merge(aquery_result, action_graph_container)

  matching_targets = [
      t for t in action_graph_container.targets
      if t.label == label
  ]
  if len(matching_targets) != 1:
    die(
        "expected exactly one target with label %r; found: %s"
        % (label, sorted(t.label for t in matching_targets))
    )
  target = matching_targets[0]

  all_artifact_ids = frozenset(
      artifact_id
      for action in action_graph_container.actions
      if action.target_id == target.id
      for artifact_id in action.output_ids
  )
  for artifact in action_graph_container.artifacts:
    if artifact.id in all_artifact_ids:
      print(artifact.exec_path)


if __name__ == "__main__":
  app.run(main)

tools/list_outputs/BUILD

# Copyright 2019 The Bazel Authors. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

package(default_visibility = ["//visibility:public"])

licenses(["notice"])  # Apache 2.0

filegroup(
    name = "srcs",
    srcs = glob(["**"]),
)

py_binary(
    name = "list_outputs",
    srcs = ["list_outputs.py"],
    srcs_version = "PY2AND3",
    deps = [
        "//third_party/py/abseil",
        "//src/main/protobuf:analysis_py_proto",
    ],
)

As a Git patch, for your convenience: https://gist.github.com/wchargin/5e6a43a203d6c95454aae2886c5b54e4

Please note that this code hasn’t been reviewed or verified for correctness; I provide it primarily as an example. If it’s useful to you, then maybe this weekend I can write some tests for it and PR it against Bazel itself.

wchargin
  • 15,589
  • 12
  • 71
  • 110
7

Between two runs of bazel, the output path should be identical. That is to say, if you build //path/to:target then bazel clean and build again, it should produce the same file. Since this output file is constant, you could run

$ bazel cquery --output=files //:main.out

and I believe that would give you a reference to where that file will be created once a build occurs (it will not build it for you).

If you're looking to go from a target to a filename that is going to be dependent on the rules_* you're running. For example in rules_go, the output path depends on the arguments to the go_library target. The rules_go team has recently documented this behavior for their project, but the cquery should stably give you the output as long as your version of Bazel contains this fix, which should be in releases after 5.3.0.

Binary output paths should, generally, be stable from version to version and you can rely on them not differing too much. However, in my experience this problem is generally a sign that you should consider moving that formerly external part of your process into Bazel as a genrule or custom rule. For example, I was formerly using this very trick to assemble a NPM package but now I do the whole thing in Bazel and have a single target that generates the .tar that I was interested in uploading to NPM. Maybe you could follow up with some specifics on what it is you're interested in doing and we might be able to work through a solution that doesn't depend on external systems understanding the Bazel build paths.

achew22
  • 325
  • 2
  • 8
  • Thanks — I know the build artifacts are supposed to be identical from one run to the next (and are identical if the rules are written correctly). I'm asking a more general question: given a black-box target `//mypackage:mytarget`, how do I get the paths to the files that were generated by that? – Mike Morearty Dec 18 '17 at 19:34
  • 1
    At some point after a build is done, you get back to the top level where you leave the hermetic world of Bazel, and have to *do* something with the files that got built. It's like Haskell ;) In my case, I want to upload them somewhere. I suppose I could take my upload script and put it in a rule that I run with `bazel run`; that would probably work. But the value of `bazel query` is to be able to programmatically extract info from the build graph, and I would think that the paths to the generated files is one thing you would be able to ask for. – Mike Morearty Dec 18 '17 at 19:37
  • 1
    Unfortunately I'm not aware of a good system for approaching the problem other than to do as you suggest and create a rule. However, it does provide a really nice set of developer ergonomics. Your deployment tool will be tracked and versioned with your code. I set up my CI to `bazel query` for a set of deployment tags and then `bazel run` each of them. It runs in the context of the CI user so it has all the ACLs/permissions necessary when it runs and I get the satisfaction of knowing that I won't ever forget to update the release.sh script. – achew22 Dec 19 '17 at 07:19
0

If you just want the file system path from the bazel cache you can do something like this from a Linux machine

find -L bazel-* -name=main.out
find -L bazel-* | grep main.out$
find -L bazel-out | grep main.out$
find -L bazel-bin | grep main.out$
jxramos
  • 7,356
  • 6
  • 57
  • 105
0

Bazel has merged the genfiles and bin directories so you can do:

echo $(bazel info bazel-bin)/main.out

The path will vary based on your current configuration.

NamshubWriter
  • 23,549
  • 2
  • 41
  • 59