18

By default, Bazel runs tests in a parallel fashion to speed things up. However, I have a resource (GPU) that can't handle parallel jobs due to the GPU memory limit. Is there a way to force Bazel to run tests in a serial, i.e., non-parallel way?

Thanks.

Shanqing Cai
  • 3,756
  • 3
  • 23
  • 36
  • I'm building TensorFlow and as a part of that I run the unit tests in the source code. Under the GPU build mode, many of those unit tests will run on GPU. When they run in parallel, I sometimes get GPU OOM errors. Thos e errors don't occur when I ran the tests one by one, manually. But it's a pain and not scalable to run tests manually. – Shanqing Cai Jan 29 '16 at 19:27
  • 3
    Does passing `--jobs=1` to the `bazel test` command work? – mrry Jan 29 '16 at 20:55

4 Answers4

15

--jobs 1 will limit the number of parallel jobs Bazel runs to 1.

You can also modify the test targets and add tags = ["exclusive"] to prevent specific test to run in parallel (see http://bazel.io/docs/test-encyclopedia.html).

  • 1
    I think when one uses the `"exclusive"` tag, needs to be aware also that the test results won't be cached. So in circumstances like CI's it can cause a performance. – Sambatyon Jul 24 '19 at 06:36
  • @Sambatyon, that's not true according to the docs, I think you're thinking of the `"external"` flag – JMAA Dec 10 '19 at 10:24
  • I'd recommend against this approach (`--jobs` or `exclusive` tag) and instead use https://stackoverflow.com/a/70084246/856336. – Matt Robinson Mar 29 '23 at 16:03
9

Use --local_test_jobs=1 to only run a single test job at a time locally.

The max number of local test jobs to run concurrently. Takes an integer, or a keyword ("auto", "HOST_CPUS", "HOST_RAM"), optionally followed by an operation ([-|]) eg. "auto", "HOST_CPUS.5". 0 means local resources will limit the number of local test jobs to run concurrently instead. Setting this greater than the value for --jobs is ineffectual

Matt Robinson
  • 799
  • 7
  • 22
  • An important effect of doing this is: if you set --local_test_jobs to a larger number than 5, (if you had 5 GPUs for example), it runs 5 local tests regardless of if RAM and CPU resources are available. It effectively ignores "cpu:x" tags. – Drew Macrae Jan 05 '23 at 14:33
0

There are 2 resources Bazel will respect limitations upon: RAM and CPU. You may hijack one (Probably RAM) to represent GPU(s) as they're available to a run and required by a test. (I've stopped short of doing this for a limited hardware resource because it feels to inelegant, but I can't think of a reason it shouldn't work.)

Drew Macrae
  • 101
  • 4
  • While the API still makes affordances to set IO limits they aren't respected by tests, which also allow for limits of tests jobs. Setting --local_test_jobs will cause bazel to ignore limitations based on resources. – Drew Macrae Jan 05 '23 at 14:27
0

Future releases of Bazel should support extra resources like GPUs

and releases that contain that change should support extra resource tags like "resources:GPU:1" when --local_extra_resources=gpu=1 is set. This should enable GPU tests to be bound by a limited quantity of GPUs, and for them to run non-exclusively and without limiting the total number of --jobs or "test_jobs"

Drew Macrae
  • 101
  • 4