How to properly handle args in sh_binary target?

Question

Suppose I am having the following sh_binary:

sh_binary(
  name = "tool_wrapper",
  srcs = ["tool_wrapper.sh"],
  data = ["@external//tools:binaries"],
  args = ["$(locations @external//tools:binaries)"],
)

Now I want to use this tool in a custom rule for generating sources. Is there a way to completely encapsulate this detail such that dependent rules don't have to know this information? I don't want to depend on hard coded paths in the wrapper script because the used binaries, or more specifically the target label, might change. This is the reason why I am using the $(locations ...) substitution and let bazel figuring out the paths.

I'm confused, what information are you trying to hide? And why are you trying to hide it? — justhecuke, Aug 02 '23 at 10:20
The arguments I am passing to `tool_wrapper` are meant to be the default arguments which should be passed in all cases and are strictly required for the script to run successfully. Here these are the paths to specific tools (it seems the only clean way for preventing hard coded paths in the script). Therefore the user of `tool_wrapper` shouldn't know this implementation detail and just augment with user provided arguments. Hopefully this makes it more clear. — rumbleD5, Aug 02 '23 at 16:56
Have you tried making a simple wrapper macro? ``` def wrapper(name, srcs): native.sh_binary( name = name, srcs = srcs, data = ["your-hardcoded-stuff"], args = ["more-hardcoded-stuff"], ) ``` — justhecuke, Aug 02 '23 at 20:13
I think the actual problem remains and will only be deferred because `args` are only considered when executing the shell binary through `bazel run`. — rumbleD5, Aug 03 '23 at 17:15
I guess I just don't understand what you're trying to accomplish here. Who is this "user" and how is this information leaking out to them? What is the specific sequence of actions with behavior you want to change? — justhecuke, Aug 03 '23 at 17:53
If I provide arguments by using the rules attribute `args = ["arg1", "arg2"]` and the user calls `bazel run //tool_wrapper -- user_arg1 user_arg2` then bazel effectively invokes `tool_wrapper.sh arg1 arg2 user_arg1 user_arg2`. Therefore the user is not aware of the `arg1` and `arg2`. But this is not possible when using `tool_wrapper` as an executable in a rule implementation as the attribute `args` won't be considered. — rumbleD5, Aug 03 '23 at 18:00
I think I see the issue now. You're not trying to hide the information (probably impossible for locally running code), you are trying to make it so BUILD/bzl developers don't require the same dependency information to use `tool_wrapper.sh` in their own macros/rules. I still don't see how a simple wrapper macro cannot handle your use-case. Just have the macro append the user-supplied args to your dependency args. `args = ["dependency"] + user_args`. This is typically how I've done it before. — justhecuke, Aug 03 '23 at 18:13

ahumesky · Accepted Answer · 2023-08-02T23:30:03.850

The arguments in args are used only when the target is used with the bazel run and bazel test commands: https://bazel.build/reference/be/common-definitions#binary.args, so to accomplish this you'll need a Starlark rule that does this for you:

WORKSPACE:

workspace(name = "my_workspace")

local_repository(
  name = "other-repo",
  path = "other-repo",
)

pkg/BUILD:

load(":tool_path_wrapper.bzl", "tool_path_wrapper")
load(":my_rule.bzl", "my_rule")

my_rule(
  name = "foo",
  src = "foo.txt",
)

tool_path_wrapper(
  name = "tool_wrapper",
  binary = ":tool",
  tools = ["subtool1", "subtool2", "@other-repo//:tool-in-other-repo"],
)

sh_binary(
  name = "tool",
  srcs = [":tool.sh"],
  data = ["subtool1", "subtool2", "@other-repo//:tool-in-other-repo"],
)

sh_binary(
  name = "subtool1",
  srcs = ["subtool1.sh"],
)

sh_binary(
  name = "subtool2",
  srcs = ["subtool2.sh"],
)

pkg/foo.txt:

foo

pkg/my_rule.bzl:

def _my_rule_impl(ctx):
  out = ctx.actions.declare_file(ctx.label.name + "_output")
  ctx.actions.run(
    inputs = [ctx.file.src],
    executable = ctx.attr._tool.files_to_run,
    outputs = [out],
    arguments = [ctx.file.src.path, out.path],
  )

  return DefaultInfo(files = depset([out]))

my_rule = rule(
  implementation = _my_rule_impl,
  attrs = {
    "src": attr.label(mandatory = True, allow_single_file = True),
    "_tool": attr.label(default = "//pkg:tool_wrapper", executable = True, cfg = "exec"),
  },
)

pkg/tool_path_wrapper.bzl:

def _get_executable_runfile_path(ctx, target):
  return  "$0.runfiles/%s/%s" % (ctx.workspace_name, target.files_to_run.executable.short_path)

def _tool_path_wrapper_impl(ctx):
  wrapper = ctx.actions.declare_file(ctx.label.name + ".sh")

  executables = [ctx.attr.binary] + ctx.attr.tools
  runfiles_paths = [_get_executable_runfile_path(ctx, e) for e in executables]

  # write a script that passes the paths of the subtools to the main tool, then the rest of the args
  ctx.actions.write(
    output = wrapper,
    content = " ".join(runfiles_paths) + " $@\n",
    is_executable = True,
  )
  return DefaultInfo(
      executable = wrapper,
      runfiles = ctx.runfiles().merge_all(
          [ctx.attr.binary.default_runfiles] +
          [t.default_runfiles for t in ctx.attr.tools])
  )

tool_path_wrapper = rule(
  implementation = _tool_path_wrapper_impl,
  attrs = {
    "binary": attr.label(mandatory = True),
    "tools": attr.label_list(),
  },
  executable = True,
)

pkg/tool.sh:

set -e

subtool1="$1"
subtool2="$2"
subtool3="$3"
in_file="$4"
out_file="$5"

"$subtool1" "$in_file" > "$out_file"
"$subtool2" "$in_file" >> "$out_file"
"$subtool3" "$in_file" >> "$out_file"

pkg/subtool1.sh:

wc -c $1

pkg/subtool2.sh:

rev $1

other-repo/WORKSPACE:

workspace(name = "tool_workspace")

other-repo/BUILD:

sh_binary(
  name = "tool-in-other-repo",
  srcs = ["tool-in-other-repo.sh"],
  visibility = ["//visibility:public"],
)

other-repo/tool-in-other-repo.sh:

md5sum $1

I was trying to adapt your proposed solution #2. If I am running `tool_wrapper` from the command line via `bazel run` everything works fine and it finds `tool_paths.txt` but if I am using the `tool_wrapper` as an executable in my rule implementation it throws an error that the file doesn't exist. However I have checked the path and it is accessible by the path reported by `$(dirname "${BASH_SOURCE[0]}")/tool_paths.txt`. What am I missing here? — rumbleD5, Aug 02 '23 at 17:39
I noticed that when running via `bazel run` it is an absolute path pointing to the execroot. In case of running from the rule implementation it is relative path pointing to the sandbox execroot. But for some reason it is not there. However in the source directory I can see it under `bazel-out/...`. It seems it is stored under a directory with the suffix `.runfiles`. I don't understand why this makes a difference because from user point of view this should be abstracted away, shouldn't it? — rumbleD5, Aug 02 '23 at 18:13
Sorry, those examples were incomplete. An action will put data dependencies in a runfiles tree, whereas genrules and the run command don't do that. So it won't really work to try to get a path from `$(rlocation)` out of a genrule to use in an action, because the paths won't line up. I've updated the example to be more complete. — ahumesky, Aug 02 '23 at 23:29
"whereas genrules and the run command don't do that" -- looking at this again, I think I'm mistaken about that, but some of the original examples were definitely making assumptions that would only work with the run command — ahumesky, Aug 03 '23 at 16:50
Thanks a lot. With slightly changes according to my concrete setup this seems to work fine. — rumbleD5, Aug 03 '23 at 17:19

How to properly handle args in sh_binary target?

1 Answers1

Linked