0

Background motivation: I have some code that uses a lock-free algorithm to share audio data to/from a CoreAudio callback (only because CoreAudio callbacks-threads are real-time and therefore aren't allowed to lock mutexes). This code seems to work fine, but if I run it under Clang's Thread Sanitizer tool, some race-condition diagnostics are reported.

My question is: to what extent is Thread Sanitizer expected to be able to correctly reason about race conditions in the context of lock-free code? i.e. can it reliably tell the difference between a buggy lock-free algorithm that has a genuine race condition and a correctly-written lock-free algorithm that does not, or is it expected that the Thread Sanitizer will just say "hey, you wrote to this data structure in thread A and later read from it in thread B, and no mutex-locking was ever observed, so I'm going to print a diagnostic about that"?

If Thread Sanitizer is able to correctly analyze lock-free algorithms, any related information about how it does that, and/or how the lock-free algorithm might be tuned/annotated to make Thread Sanitizer's diagnoses more accurate would be appreciated.

Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • it's impossible to answer w/o seeing your lock-free code in a [mcve]. – 273K Feb 19 '23 at 18:42
  • 2
    What exactly is thread sanitizer reporting? If it is reporting a data race, I don't think there are supposed to be any false positives. However all used code, including the standard library, needs to be compiled with thread sanitizer instrumentation to avoid false positives (and https://github.com/google/sanitizers/wiki/ThreadSanitizerCppManual also claims that it is still not well tested with C++, not sure how true that is). – user17732522 Feb 19 '23 at 18:43
  • The problem of detecting whether a program is thread safe or not is a variant of halting problem and it is well known to be provably unsolvable. And therefore no such tool can be 100% correct. I suspect it only looks at some common patterns instead of actually doing proper analysis. – freakish Feb 19 '23 at 18:43
  • @273K the question isn't about whether my code is correct or not; it's about the scope of TSAN's capabilities. TSAN is either capable of correctly analyzing lock-free code, or it isn't, regardless of what my particular algorithm does. – Jeremy Friesner Feb 19 '23 at 19:26
  • You have written **some race-condition diagnostics are reported**. And we don't know w/o seeing your code if it's a false warning. It's not a false warning most likely, your code has a data race. – 273K Feb 19 '23 at 22:26
  • I don't have permission to post the code publicly. Let's rephrase the question, then: is it possible to write a useful lock-free multithreaded program that TSAN won't flag as racy? Do such programs exist in the real world? – Jeremy Friesner Feb 20 '23 at 03:17

1 Answers1

2

As far as I know thread sanitizer should, setting aside bugs, not produce false positives with the caveats that

  1. C++ exceptions are not supported,
  2. fences are not properly supported,
  3. all code, including the standard library, needs to be compiled with TSAN instrumentation, and
  4. for synchronization, directly or indirectly, only pthreads primitives and compiler built-in atomics may be used.

Of course, it can only detect data races and similar UB situations and only in execution paths taken. It cannot generally recognize race conditions that result in unintended behavior or evaluate whether a data structure is thread-safe.

user17732522
  • 53,019
  • 2
  • 56
  • 105
  • Thanks for this answer -- I was able to modify my program so that it could run under TSAN without triggering any TSAN diagnostics, lock-free operations notwithstanding. – Jeremy Friesner Feb 22 '23 at 19:53
  • For posterity's sake: When my client was disconnected from the server, it would (a) call `AudioOutputUnitStop()` to end the CoreAudio rendering-callbacks, then (b) tear down the shared data-structures, and finally (c) zero-out the (now-dangling) shared pointers to those data-structures. TSAN objected to step (c), apparently because `AudioOutputUnitStop()` only disabled the callbacks, it doesn't terminate the audio-rendering thread, so TSAN apparently believed those pointers were still accessible via the other thread. My fix was to also call `AudioComponentInstanceDispose()` before doing (c). – Jeremy Friesner Feb 22 '23 at 19:56