3

I'm trying to use the linux perf tool to sample the memory accesses in my program. Specifically, I'm using perf to monitor read/write access of every CPU in NUMA.

Now, I can monitor every single CPU's read and write memory access, but I also have to know whether the access is a local memory access or a remote memory access.

I have used perf list to go through the events list, but I just find out some events about socket's memory access.

Questions

  1. Is there any way to get every single CPU's remote memory access, when using perf ?
  2. Is there a better option than perf ?
Aries_Liu
  • 95
  • 1
  • 10

1 Answers1

4

Yes, the PMU unit in your CPU can probably do what you want through the various uncore counters - in particular they can count the various offcore responses for non-local memory access. This blog post is a reasonable starting point.

The main problem is that often the perf tool, which is tied to the specific kernel version, will lag behind in its support of modern processors1, especially when it comes to uncore and NUMA related events2.

To work around that, you can use Andi Kleen's pmu-tools, which provides an ocperf wrapper script that uses whatever underlying perf you have on your system but with up-to-date event ids downloaded directly from Intel. That will usually give you access to the uncore events you need.

Of course, even when you get that working, these events are often very tough to interpret, especially because the mental model you have of demand-memory requests is complicated by a ton of factors such as prefetch behavior, request-for-ownership, accesses that "hit" in a line-buffer in the process of being filled, etc, etc.


1 Both because adding new processors/events as some lag, but especially because the tool is tied to the kernel, and you likely aren't on a bleeding edge kernel, so even though mainline perf might have support, you are stuck with the perf version associated with your kernel.

2 Probably because most kernel developers, like developers in general, aren't working on NUMA systems.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386