0

Bazel has been working great for me recently, but I've stumbled upon a question for which I have yet to find a satisfactory answer:

How can one collect all files bearing a certain extension from the workspace?

Another way of phrasing the question: how could one obtain the functional equivalent of doing a glob() across a complete Bazel workspace?

Background

The goal in this particular case is to collect all markdown files to run some checks and generate a static site from them.

At first glance, glob() sounds like a good idea, but will stop as soon as it runs into a BUILD file.

Current Approaches

The current approach is to run the collection/generation logic outside of the sandbox, but this is a bit dirty, and I'm wondering if there is a way that is both "proper" and easy (ie, not requiring that each BUILD file explicitly exposes its markdown files.

Is there any way to specify, in the workspace, some default rules that will be added to all BUILD files?

Shastick
  • 1,218
  • 1
  • 12
  • 29

1 Answers1

2

You could write an aspect for this to aggregate markdown files in a bottom-up manner and create actions on those files. There is an example of a file_collector aspect here. I modified the aspect's extensions for your use case. This aspect aggregates all .md and .markdown files across targets on the deps attribute edges.

FileCollector = provider(
    fields = {"files": "collected files"},
)

def _file_collector_aspect_impl(target, ctx):
    # This function is executed for each dependency the aspect visits.

    # Collect files from the srcs
    direct = [
        f
        for f in ctx.rule.files.srcs
        if ctx.attr.extension == f.extension
    ]

    # Combine direct files with the files from the dependencies.
    files = depset(
        direct = direct,
        transitive = [dep[FileCollector].files for dep in ctx.rule.attr.deps],
    )

    return [FileCollector(files = files)]

markdown_file_collector_aspect = aspect(
    implementation = _file_collector_aspect_impl,
    attr_aspects = ["deps"],
    attrs = {
        "extension": attr.string(values = ["md", "markdown"]),
    },
)

Another way is to do a query on file targets (input and output files known to the Bazel action graph), and process these files separately. Here's an example querying for .bzl files in the rules_jvm_external repo:

$ bazel query //...:* | grep -e ".bzl$"
//migration:maven_jar_migrator_deps.bzl
//third_party/bazel_json/lib:json_parser.bzl
//settings:stamp_manifest.bzl
//private/rules:jvm_import.bzl
//private/rules:jetifier_maven_map.bzl
//private/rules:jetifier.bzl
//:specs.bzl
//:private/versions.bzl
//:private/proxy.bzl
//:private/dependency_tree_parser.bzl
//:private/coursier_utilities.bzl
//:coursier.bzl
//:defs.bzl
Jin
  • 12,748
  • 3
  • 36
  • 41
  • Thanks! First time I read about Bazel aspects, I'll try this out ASAP :) – Shastick May 07 '20 at 09:16
  • So, while this is pretty interesting, as far as I understand, it still requires me do declare all markdown files of package, which is what I would like to avoid. – Shastick May 07 '20 at 12:00
  • Can you mechanically add `filegroup` glob of `*.md` files in each package? It's not well-documented, but you can also add that `filegroup` to `/tools/build_rules/prelude_bazel` to be prepended to every BUILD file in the workspace. That said, you still need the `filegroup` to include other `filegroup`s recursively. – Jin May 08 '20 at 04:36