What is the most efficient way to extract/collect files from a list of targets/providers in Bazel?

Question

I'm writing some rules and learning Starlark as I progress.

Assume I have my own provider:

ModularResources = provider(
    doc = "Modular resources",
    fields = {
        "artifactId": "Former Maven artifact id (don't ask me why)",
        "srcs": "List of labels (a glob(..) thing)",
    },
)

def _modular_resources_impl(ctx):
    return ModularResources(
        artifactId = ctx.attr.artifactId,
        srcs = ctx.attr.srcs,
    )

modular_resources = rule(
    implementation = _modular_resources_impl,
    attrs = {
        "artifactId": attr.string(
            mandatory = True,
        ),
        "srcs": attr.label_list(
            allow_files = True,
            mandatory = True,
        ),
    },
)

Then I have a generator rule which requires these:

some_generator = rule(
    attrs = {
        "deps": attr.label_list(
            providers = [ ModularResources ]
        ),
        ...
    },
    ...
)

In my implementation I discovered that I need to do a couple of unwraps to get the files:

def _get_files(deps):
    result = []
    for dep in deps:
        for target in dep[ModularResources].srcs:
            result += target.files.to_list()
    return result

Is there a more efficient way to perform the collection?

As to why I'm doing this, the generator actually needs a special list of files like this:

def _format_files(deps):
    formatted = ""
    for dep in deps:
        for target in dep[ModularResources].srcs:
            formatted += ",".join([dep[ModularResources].artifactId + ":" + f.path for f in target.files.to_list()])
    return formatted

FWIW, here is an example how this is used:

a/BUILD:

modular_resources(
    name = "generator_resources",
    srcs = glob(
        ["some/path/**/*.whatever"],
    ),
    artifactId = "a",
    visibility = ["//visibility:public"],
)

b/BUILD:

some_generator(
    name = "...",
    deps = [
        "//a:generator_resources"
    ]
)

Your implementation looks reasonable to me. Do you actually observe a performance problem? — rds, Jan 22 '20 at 18:15
No. I'm not at the point yet where I'm investigating performance problems. Just curious about the proper way (tm). — Gunnar, Jan 23 '20 at 19:09

score 0 · Answer 1 · answered Jan 22 '20 at 18:21

If you want to trade memory for better performance, maybe the operation can more easily be parallelised by blaze if it's done in the provider instead:

def _modular_resources_impl(ctx):
    return ModularResources(
        artifactId = ctx.attr.artifactId,
        formatted_srcs = ",".join([artifactId + ":" + f.path for f in ctx.files.src])
    )

What is the most efficient way to extract/collect files from a list of targets/providers in Bazel?

1 Answers1