measure cycles spent in accessing remote cache

Question

How to measure cycles spent in accessing shared remote cache say L3. I need to get this cache access information both system-wide and for per-thread. Is there any specific tool/hardware requirements. Or can I use any formula to get an approximate value of cycles spent over a time interval

That's somewhat of the wrong question to ask. Accessing cache is often overlapped with other things. So the # of cycles spent accessing cache may or may not mean anything. — Mysticial, Feb 25 '13 at 06:58
@Mysticial i need to know whether different threads' cache access to l3 is degrading the performance. How else can it be calculated — naran, Feb 28 '13 at 06:18
You can only guess at it. Profilers will give you big-picture aggregate numbers. Then compare them with the results of other apps with different memory patterns. — Mysticial, Feb 28 '13 at 06:40

score 3 · Accepted Answer · edited May 23 '17 at 12:04

To get the average latencies (when a single thread is running) to various caches present on your machine, you can use memory profiler tools such as RMMA for windows (http://cpu.rightmark.org/products/rmma.shtml) and Lmbench for linux.

You can also write your own benchmarks based on the ideas used by these tools. See the answers posted on this StackOverflow question: measuring latencies of memory Or Google for how the Lmbench benchmark works.

If you want to find exact latencies for particular memory access patterns, you will need to use a simulator. This way you can trace a memory access as it flows through the memory system. However simulators will not model all the effects that are present in a modern processor or memory system.

If you want to learn how multiple threads affect the average latency to L3, I think the best bet would be to write your own benchmark.

Does this profiling extensively need Harware Performance Counters — naran, Mar 12 '13 at 06:34

measure cycles spent in accessing remote cache

1 Answers1