How to measure cycles spent in accessing shared remote cache say L3. I need to get this cache access information both system-wide and for per-thread. Is there any specific tool/hardware requirements. Or can I use any formula to get an approximate value of cycles spent over a time interval
-
That's somewhat of the wrong question to ask. Accessing cache is often overlapped with other things. So the # of cycles spent accessing cache may or may not mean anything. – Mysticial Feb 25 '13 at 06:58
-
@Mysticial i need to know whether different threads' cache access to l3 is degrading the performance. How else can it be calculated – naran Feb 28 '13 at 06:18
-
You can only guess at it. Profilers will give you big-picture aggregate numbers. Then compare them with the results of other apps with different memory patterns. – Mysticial Feb 28 '13 at 06:40
1 Answers
To get the average latencies (when a single thread is running) to various caches present on your machine, you can use memory profiler tools such as RMMA for windows (http://cpu.rightmark.org/products/rmma.shtml) and Lmbench for linux.
You can also write your own benchmarks based on the ideas used by these tools. See the answers posted on this StackOverflow question: measuring latencies of memory Or Google for how the Lmbench benchmark works.
If you want to find exact latencies for particular memory access patterns, you will need to use a simulator. This way you can trace a memory access as it flows through the memory system. However simulators will not model all the effects that are present in a modern processor or memory system.
If you want to learn how multiple threads affect the average latency to L3, I think the best bet would be to write your own benchmark.

- 1
- 1

- 3,390
- 2
- 29
- 48