0

How to measure cycles spent in accessing shared remote cache say L3. I need to get this cache access information both system-wide and for per-thread. Is there any specific tool/hardware requirements. Or can I use any formula to get an approximate value of cycles spent over a time interval

Neha Karanjkar
  • 3,390
  • 2
  • 29
  • 48
naran
  • 57
  • 10
  • That's somewhat of the wrong question to ask. Accessing cache is often overlapped with other things. So the # of cycles spent accessing cache may or may not mean anything. – Mysticial Feb 25 '13 at 06:58
  • @Mysticial i need to know whether different threads' cache access to l3 is degrading the performance. How else can it be calculated – naran Feb 28 '13 at 06:18
  • You can only guess at it. Profilers will give you big-picture aggregate numbers. Then compare them with the results of other apps with different memory patterns. – Mysticial Feb 28 '13 at 06:40

1 Answers1

3

To get the average latencies (when a single thread is running) to various caches present on your machine, you can use memory profiler tools such as RMMA for windows (http://cpu.rightmark.org/products/rmma.shtml) and Lmbench for linux.

You can also write your own benchmarks based on the ideas used by these tools. See the answers posted on this StackOverflow question: measuring latencies of memory Or Google for how the Lmbench benchmark works.

If you want to find exact latencies for particular memory access patterns, you will need to use a simulator. This way you can trace a memory access as it flows through the memory system. However simulators will not model all the effects that are present in a modern processor or memory system.

If you want to learn how multiple threads affect the average latency to L3, I think the best bet would be to write your own benchmark.

Community
  • 1
  • 1
Neha Karanjkar
  • 3,390
  • 2
  • 29
  • 48