0

I have a Bazel rule that produces an output file, and can optionally take a dependency to a similar target, like so:

load(":node.bzl", "node")

node(
        name = "one",
        src = "one.txt",
)

node(
        name = "two",
        src = "two.txt",
        dep = "one",
)

Now, if I call bazel build :two, I want it to build :one first, and then :two, each producing an output file. One would think that'd be as simple as writing a node.bzl file like

def _node_impl(ctx):
      out_file = ctx.actions.declare_file(ctx.attr.name)
      print("ok running: out_file={} src={}".format(out_file.path, ctx.file.src.path))
      ctx.actions.run_shell(
            outputs = [out_file],
            inputs = [ctx.file.src],
            command = ("echo {} >{}".format(ctx.file.src.path, out_file.path)),
      )
      print("done running")

      return [
              DefaultInfo(
                    files = depset([out_file]),
              ),
      ]

node = rule(
      implementation = _node_impl,
      attrs = {
            "src": attr.label(
                  mandatory = True,
                  allow_single_file = True,
            ),
            "dep": attr.label(
            ),
      },
)

but, alas, this will only create the "two" file, not the dependent "one":

INFO: Starting clean (this may take a while). Consider using --async if the clean takes more than several minutes.
bazel build :two
DEBUG: nok/node.bzl:3:12: ok running: out_file=bazel-out/k8-fastbuild/bin/one src=one.txt
DEBUG: nok/node.bzl:9:12: done running
DEBUG: nok/node.bzl:3:12: ok running: out_file=bazel-out/k8-fastbuild/bin/two src=two.txt
DEBUG: nok/node.bzl:9:12: done running
INFO: Analyzed target //:two (4 packages loaded, 9 targets configured).
INFO: Found 1 target...
Target //:two up-to-date:
  bazel-bin/two
INFO: Elapsed time: 0.491s, Critical Path: 0.05s
INFO: 1 process: 1 linux-sandbox.
INFO: Build completed successfully, 2 total actions

Note that the debug output confirms that it sees the dependency when it builds the dependency graph, just when it comes to executing the dependent target, it decides to skip it.

Now, I've gotten this to work by adding a provider that adds a transitive dependency to the depset of output files, but shouldn't bazel do that automatically? Here's my solution:

NodeProv = provider(fields = ["out_file"])

def _node_impl(ctx):
      out_file = ctx.actions.declare_file(ctx.attr.name)
      print("ok running: out_file={} src={}".format(out_file.path, ctx.file.src.path))
      ctx.actions.run_shell(
            outputs = [out_file],
            inputs = [ctx.file.src],
            command = ("echo {} >{}".format(ctx.file.src.path, out_file.path)),
      )
      print("done running")

      dep_file = [ctx.attr.dep[NodeProv].out_file] if ctx.attr.dep else []

      return [
              DefaultInfo(
                    files = depset([out_file], transitive = [depset(dep_file)]),
              ),
              NodeProv(
                      out_file = out_file
              ),
      ]

node = rule(
      implementation = _node_impl,
      attrs = {
            "src": attr.label(
                  mandatory = True,
                  allow_single_file = True,
            ),
            "dep": attr.label(
                  providers = [(NodeProv)],
            ),
      },
)

This does indeed build both targets:

INFO: Analyzed target //:two (4 packages loaded, 9 targets configured).
INFO: Found 1 target...
Target //:two up-to-date:
  bazel-bin/one
  bazel-bin/two
INFO: Elapsed time: 0.532s, Critical Path: 0.07s
INFO: 2 processes: 2 linux-sandbox.
INFO: Build completed successfully, 3 total actions

What does the collective hive mind think, is this the recommended way of solving this or is there better approach?

Schmike
  • 121
  • 1
  • 6

1 Answers1

2

So both one and two get analyzed (i.e., the implementation functions for both get run, as you saw), but the action in one does not get run because nothing depends on that action's output.

The reason that

Target //:two up-to-date:
  bazel-bin/two

does not show bazel-bin/one is that only the files from top-level targets (i.e. targets named on the command line) get listed. This is by design, because if you have lots of targets in your build graph, you probably don't want to see all their output files listed out.

When you changed the rule so that files from dependencies get put in the target's DefaultInfo, you basically said that files from dependencies of the target are outputs of the target itself, so that's why the action from one then gets run, and its output gets printed on the command line.

There are cases where you do want to "forward" files from dependencies, but what's much more typical is to put files into providers (as you've done), and then read those providers to use files from dependencies as inputs to other actions. Something like this:

def _node_impl(ctx):
  out_file = ctx.actions.declare_file(ctx.attr.name)
  dep_file = [ctx.attr.dep[NodeProv].out_file] if ctx.attr.dep else []
  inputs = [ctx.file.src] + dep_file
  ctx.actions.run_shell(
    outputs = [out_file],
    inputs = inputs,
    command = "cat {input_paths} > {out_path}".format(
      input_paths = " ".join([f.path for f in inputs]),
      out_path = out_file.path),
  )

  return [
    DefaultInfo(files = depset([out_file])),
    NodeProv(out_file = out_file),
  ]

Note that with this, bazel-bin/one will still not get printed in the "up-to-date" message (again this is usually intentional), but the file one from the target one should be built, and its content will be used in building the file two from the target two.

(Note also that for a real ruleset, you'll want to use args so that memory can be better used when dealing with strings for command lines)

ahumesky
  • 4,203
  • 8
  • 12
  • To extend a bit: I suspect the common source of confusion is that false parallels are assumed, but rule is not a (make) recipe and when `BUILD` file is being processed, it does not run rules to create targets, but actions... actions are then sorted out and executed if needed. – Ondrej K. Jun 20 '20 at 10:37
  • Yeah, exactly. Logically, there are 2 graphs: the target graph (target `two` depends on target `one`, and this graph is what `query` and `cquery` work on), and the action graph (the output file from the action in `one` is an input to the action in `two`). The action graph is what's used to determine what to run. – ahumesky Jun 22 '20 at 19:22
  • Yeah, I was familiar with the two graphs, what had thrown me off were two other things: 1) having to tell the action about dep_file and 2) the build output only mentioning one target, while it builds all two. But thanks to ahumesky's excellent explanation, I now know how it works! – Schmike Jun 22 '20 at 23:39