0

I wrote a rule to run some compiler (Synopsys VCS MX). When running a single target, everything works great. When running multiple targets concurrently, the compiler runs into a segmentation fault. This doesn't happen when running Bazel with --spawn_strategy=local. Also setting --jobs 1 works.

The only reason for this that I can think of, is that the compiler tries to write to a file with an absolute path, colliding with other instances of itself.

My questions are as follows:

  1. If my theory was correct, wouldn't the problem occur regardless of weather I'm sandboxing or not?
  2. If I'm wrong, how could the compilers be colliding if not because of some shared file?
  3. Say that for every sandbox, I wanted to mount a /tmp which points to a different directory, would that be possible?

Update: According to what I saw in strace, both instances of the compiler open a file /tmp/vcs_20200428163636_3/v710_tok for reading and writing, and at some point one instance calls pread64() which causes the segfault. Notice the files name, which looks suspiciously like the date hinting there was an attempt get a unique file name, but both instances weren't executed far enough apart.

Question 1 and 3 still stand.

Erran
  • 131
  • 1
  • 10

1 Answers1

0

The solution:

By adding --sandbox_tmpfs_path=/tmp the problem was solved. This tells Bazel, that when creating a sandbox for an action, it should mount an empty writable directory mounted to the path /tmp. This way each compiler has its own /tmp and they don't collide.

Why does the collision happen only when sandboxing?

When executing run_shell in a sandbox, Bazel will execute shell using clone, which causes it to run in a new PID namespace. The PID of the compiler (3 in this case as can be seen in /tmp/vcs_20200428163636_3/v710_tok) is added to the file opened in /tmp, in an attempt to make the file name unique. However, because both compilers are forked within their separate sandbox PID namespaces, they both see their PID relative to their sandbox, thus allowing them to collide.

Erran
  • 131
  • 1
  • 10