Questions tagged [intel-pmu]

Questions related to the use of the Intel Performance Management Unit, which provides performance counters related to the performance of currently executing code.

The Intel performance management unit provides performance counters which track performance related metrics for the currently executing code.

They are useful while profiling code, and are supported by Intel's VTune, Linux's perf command and the Windows Performance Toolkit.

The counters and the details of how to program them vary by CPU architecture and the details are available in Chapter 18 and 19 of the Intel-64 and IA-32 Architectures Software Developer Manual, Volume 3.

Other libraries / tools for using the PMU include:

  • Likwid: Various performance-related tools, including a micro-benchmarking framework. Supports Intel-PMU, AMD perf counters, some ARM, POWER8/9, and some NVidia GPUs.

  • libpfc: A simple Linux kernel module and library to let user-space program the counters, so it can use rdpmc in user-space. Example usage in the author's answer to this SO question.

  • https://github.com/andikleen/pmu-tools some wrappers around Linux perf. ocperf.py used to be more useful, before perf itself got symbolic event names for more CPU-specific events. But there are other tools in that repo.

91 questions
3
votes
1 answer

Read PMU counters using wrmsrl and rdmsrl

I'm trying to read the LLC cache miss hardware counter in a Linux kernel module on an Intel Xeon gold (Skylake generation) processor. The result of the following code is always zero: #define PMC_ESEL_UMASK_SHIFT 8 #define PMC_ESEL_CMASK_SHIFT…
Mohammad Siavashi
  • 1,192
  • 2
  • 17
  • 48
3
votes
1 answer

Performance Counter for DRAM Per-Rank Memory Access

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. I need to retrieve the number of accesses to each DRAM rank, over time, to estimate its power consumption. Based on page 261 of the chipset documentation (i.e., Datasheet,…
TheAhmad
  • 810
  • 1
  • 9
  • 21
3
votes
1 answer

Difference Between mem_load_uops_retired.l3_miss and offcore_response.demand_data_rd.l3_miss.local_dram Events

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. AFAIK, mem_load_uops_retired.l3_miss, counts the number of DRAM demand (i.e., non-prefetch) data read accesses. offcore_response.demand_data_rd.l3_miss.local_dram, as its name…
TheAhmad
  • 810
  • 1
  • 9
  • 21
3
votes
3 answers

Profiling Cache hit rate of a function of C program

I want to get cache hit rate for a specific function of a C/C++ program (foo) running on a Linux machine. I am using gcc and no compiler optimization. With perf I can get hit rates for the entire program using the following command. perf stat -e…
Atanu Barai
  • 115
  • 7
3
votes
1 answer

How to use rdpmc instruction for counting L1d cache miss?

I am wondering is there any single event that can capture the L1D cache misses. I tried to capture L1d cache miss by measuring latency to access specific memory with rdtsc at the beginning. On my setting, if the L1d cache miss happens, it should hit…
ruach
  • 1,369
  • 11
  • 21
3
votes
1 answer

Using the perf events from perf list programatically

When I run perf list on my Linux system I get a long list of available perf events. Is it possible to list and use these events programatically from another process, using perf_event_open(2)? That is, how can I get this list from another process and…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
3
votes
1 answer

Perf Imprecise Call-Graph Report

Recent Intel processors provide a hardware feature (a.k.a., Precise Event-Based Sampling (PEBS)) to access precise information about the CPU state on some sampled CPU events (e.g., e). Here is an extract from Intel 64 and IA-32 Achitecture's…
TheAhmad
  • 810
  • 1
  • 9
  • 21
3
votes
1 answer

Inconsistent values of ARM PMU cycles counter

I'm trying to measure performance of my code in linux kernel with pmu. First of all I want to test pmu therefore created simple loop of couple operations in kernel. I placed it under spin lock with disabled interrupts so my test code can't be…
scopichmu
  • 135
  • 11
3
votes
1 answer

What is the meaning of IB read, IB write, OB read and OB write. They came as output of Intel® PCM while monitoring PCIe bandwidth

I am trying to measure the PCIe bandwidth of NIC devices using Intel® Performance Counter Monitor (PCM) tools. But, I am not able to understand the output of it. To measure the PCIe bandwidth, I executed the binary pcm-iio. This binary helps to…
3
votes
1 answer

Paradoxical VTune Amplifier microarchitecture exploration results

I am trying to optimize a sin/cos approximation function. At its core there is a simple Horner scheme consisting of a bunch of multiplies and adds. Compiler is MSVC from VS2017, processor is Intel Xeon E5-1650, hyperthreading is on (but observations…
Max Langhof
  • 23,383
  • 5
  • 39
  • 72
3
votes
0 answers

What causes the DTLB_LOAD_MISSES.WALK_* performance events to occur?

Consider the following loop: .loop: add rsi, STRIDE mov eax, dword [rsi] dec ebp jg .loop where STRIDE is some non-negative integer and rsi contains a pointer to a buffer defined in the bss section. This loop is the…
Hadi Brais
  • 22,259
  • 3
  • 54
  • 95
2
votes
0 answers

Can rdpmc be used to read the fixed-function counters on AMD?

On Intel the fixed-function performance counters can be read by setting bit 30 of ecx as well the index of the counter to read (0-4) in the bottom bits of that same register. Is something similar possible on AMD CPUs?
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
2
votes
0 answers

Which instruction is recorded when an overflow of PMC occurs?

When profiling a program with "cpu-cycle" event using "pref record -p $pid && perf report" command, I think the underlying hardware PMC does the following things: Increase the counter when a cycle come Record the "current instruction" when the…
2
votes
0 answers

Why LLC related performance events share the same event id in perf?

I am using Intel spr architecture, with a kernel version of 5.14 and a perf version of 4.18. I tried to analyze the meaning of LLC related events based on the method in this answer, but found that all events have the same ID: [ C(LL ) ] = { […
2
votes
1 answer

Why "setne %al" used "a lot of cycles" in perf annotation?

I was very confused when I saw this perf report. I have tried it for several times, and this setne instruction always takes the most in the function. The function is a big function and below just shows a small piece of the function. The report is…