3

I have a wait-free implementation for binary search trees but I am not able to figure out concrete methods to measure thread contention. By contention, here I mean number of threads that try to access the same piece of memory at the same time.

So far, I have searched ThreadMXBean and ThreadInfo class, but as there are no locks involved, I haven't found any solution yet.

  • Just look how much CPU time is used by the method performing the concurrent update. Without contention, a CAS takes almost no time… – Holger May 13 '14 at 13:27
  • I have measured per thread CPU time, but that doesn't seem to give me an idea of contention. But your comment leads me to an interesting way to measure contention. The difference between maximum and minimum thread CPU time should give me an idea of delay which is invariably due to contention! – grillSandwich May 15 '14 at 03:01
  • The key point of *lock free* (aka *wait free*) algorithms is that they do not include wait operations. The only thing which can happen is that an update operation has to be repeated because a concurrent update interfered the update. The repetition will, just like any other operation, consume 100% CPU time. The only way to measure contention here is to measure the amount of repetitions. Unless the algorithm implementation records retries by itself you have to measure the CPU time of the update operation (which will raise on contention) and compare it to overall thread execution time. – Holger May 15 '14 at 07:48

2 Answers2

3

There is no way to measure the contention over "memory location" without prohibitive performance costs. Direct measurement (e.g. properly synchronized counter wrapping all the accesses) will introduce the artificial bottlenecks, which will blow up test reliability.

"Same time" is loosely defined on the scale you want to measure it, because only a single CPU "owns" the particular location in memory in a given time. The best you can do in this case it to measure the rate at which CPUs are dealing with memory conflicts, e.g. through the HW counters. Doing that requires the understanding of memory subsystem on a given platfom. Also, the HW counters attribute for machine (= CPU) state, not the memory state; in other words, you can estimate how many conflicts the particular instructions have experienced, not how many CPUs accessed the given memory location.

Aleksey Shipilev
  • 18,599
  • 2
  • 67
  • 86
1

Trying the measure within the source of the contention is the wrong approach. What might be the reason for contention anyways?!

So, first of all, setup a benchmarking suite which runs typical access patterns on your data structure and graph the performance for different thread counts. Here is a nice example from nitro cache performance page.

If you scale almost linear: congrats, you are done!

If you don't scale linear, you need more insight. Now you need to profile the system as a whole and see what is the reason e.g. for CPU pipeline stalls. The best way is to use low-overhead tracing for this. On Linux you can use OProfile. OProfile has also Java support, which helps you to correlate the JITed machine code to your Java program.

cruftex
  • 5,545
  • 2
  • 20
  • 36
  • Assuming we are talking about a list, threads contend for modifying a field from a node before others do so. All threads but first one then race to modify the newly appended node. That is the contention. Thanks for the links! – grillSandwich May 15 '14 at 03:11
  • If want find areas of lock contention, I would suggest to start at the high level with a sampling profiler e.g. HPROF. Since it is sampling you can adjust the overhead imposed. – cruftex May 15 '14 at 07:58