Questions tagged [intel-pmu]

Questions related to the use of the Intel Performance Management Unit, which provides performance counters related to the performance of currently executing code.

The Intel performance management unit provides performance counters which track performance related metrics for the currently executing code.

They are useful while profiling code, and are supported by Intel's VTune, Linux's perf command and the Windows Performance Toolkit.

The counters and the details of how to program them vary by CPU architecture and the details are available in Chapter 18 and 19 of the Intel-64 and IA-32 Architectures Software Developer Manual, Volume 3.

Other libraries / tools for using the PMU include:

Likwid: Various performance-related tools, including a micro-benchmarking framework. Supports Intel-PMU, AMD perf counters, some ARM, POWER8/9, and some NVidia GPUs.
libpfc: A simple Linux kernel module and library to let user-space program the counters, so it can use rdpmc in user-space. Example usage in the author's answer to this SO question.
https://github.com/andikleen/pmu-tools some wrappers around Linux perf. ocperf.py used to be more useful, before perf itself got symbolic event names for more CPU-specific events. But there are other tools in that repo.

91 questions

votes

1 answer

Read PMU counters using wrmsrl and rdmsrl

I'm trying to read the LLC cache miss hardware counter in a Linux kernel module on an Intel Xeon gold (Skylake generation) processor. The result of the following code is always zero: #define PMC_ESEL_UMASK_SHIFT 8 #define PMC_ESEL_CMASK_SHIFT…

asked May 09 '22 at 20:01

Mohammad Siavashi

1,192
2
17
48

votes

1 answer

Performance Counter for DRAM Per-Rank Memory Access

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. I need to retrieve the number of accesses to each DRAM rank, over time, to estimate its power consumption. Based on page 261 of the chipset documentation (i.e., Datasheet,…

performancecounter perf energy memory-access intel-pmu

asked Mar 11 '21 at 23:54

TheAhmad

votes

1 answer

Difference Between mem_load_uops_retired.l3_miss and offcore_response.demand_data_rd.l3_miss.local_dram Events

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. AFAIK, mem_load_uops_retired.l3_miss, counts the number of DRAM demand (i.e., non-prefetch) data read accesses. offcore_response.demand_data_rd.l3_miss.local_dram, as its name…

intel performancecounter perf memory-access intel-pmu

asked Mar 02 '21 at 15:20

TheAhmad

votes

3 answers

Profiling Cache hit rate of a function of C program

I want to get cache hit rate for a specific function of a C/C++ program (foo) running on a Linux machine. I am using gcc and no compiler optimization. With perf I can get hit rates for the entire program using the following command. perf stat -e…

c linux perf papi intel-pmu

asked Oct 29 '20 at 06:36

Atanu Barai

votes

1 answer

How to use rdpmc instruction for counting L1d cache miss?

I am wondering is there any single event that can capture the L1D cache misses. I tried to capture L1d cache miss by measuring latency to access specific memory with rdtsc at the beginning. On my setting, if the L1d cache miss happens, it should hit…

assembly x86 perf intel-pmu

asked Oct 05 '20 at 14:28

ruach

1,369
11
21

votes

1 answer

Using the perf events from perf list programatically

When I run perf list on my Linux system I get a long list of available perf events. Is it possible to list and use these events programatically from another process, using perf_event_open(2)? That is, how can I get this list from another process and…

linux performance perf intel-pmu

asked Aug 30 '20 at 02:24

BeeOnRope

60,350
16
207
386

votes

1 answer

Perf Imprecise Call-Graph Report

Recent Intel processors provide a hardware feature (a.k.a., Precise Event-Based Sampling (PEBS)) to access precise information about the CPU state on some sampled CPU events (e.g., e). Here is an extract from Intel 64 and IA-32 Achitecture's…

linux x86-64 callstack perf intel-pmu

asked Apr 18 '20 at 21:27

TheAhmad

votes

1 answer

Inconsistent values of ARM PMU cycles counter

I'm trying to measure performance of my code in linux kernel with pmu. First of all I want to test pmu therefore created simple loop of couple operations in kernel. I placed it under spin lock with disabled interrupts so my test code can't be…

c linux-kernel arm arm64 intel-pmu

asked Oct 28 '19 at 05:57

scopichmu

votes

1 answer

What is the meaning of IB read, IB write, OB read and OB write. They came as output of Intel® PCM while monitoring PCIe bandwidth

I am trying to measure the PCIe bandwidth of NIC devices using Intel® Performance Counter Monitor (PCM) tools. But, I am not able to understand the output of it. To measure the PCIe bandwidth, I executed the binary pcm-iio. This binary helps to…

x86 performance-testing intel intel-pmu mellanox

asked Jul 21 '19 at 13:58

Anubhav Choudhary

votes

1 answer

Paradoxical VTune Amplifier microarchitecture exploration results

I am trying to optimize a sin/cos approximation function. At its core there is a simple Horner scheme consisting of a bunch of multiplies and adds. Compiler is MSVC from VS2017, processor is Intel Xeon E5-1650, hyperthreading is on (but observations…

performance x86 micro-optimization intel-vtune intel-pmu

asked Nov 30 '18 at 14:04

Max Langhof

23,383
5
39
72

votes

0 answers

What causes the DTLB_LOAD_MISSES.WALK_* performance events to occur?

Consider the following loop: .loop: add rsi, STRIDE mov eax, dword [rsi] dec ebp jg .loop where STRIDE is some non-negative integer and rsi contains a pointer to a buffer defined in the bss section. This loop is the…

x86 cpu-architecture tlb intel-pmu

asked Sep 29 '18 at 22:05

Hadi Brais

22,259
3
54
95

votes

0 answers

Can rdpmc be used to read the fixed-function counters on AMD?

On Intel the fixed-function performance counters can be read by setting bit 30 of ecx as well the index of the counter to read (0-4) in the bottom bits of that same register. Is something similar possible on AMD CPUs?

x86 intel performancecounter amd-processor intel-pmu

asked Aug 03 '23 at 23:52

BeeOnRope

60,350
16
207
386

votes

0 answers

Which instruction is recorded when an overflow of PMC occurs?

When profiling a program with "cpu-cycle" event using "pref record -p $pid && perf report" command, I think the underlying hardware PMC does the following things: Increase the counter when a cycle come Record the "current instruction" when the…

x86 cpu-architecture perf intel-pmu performance-monitor

asked Jun 05 '23 at 03:38

Frontier_Setter

votes

0 answers

Why LLC related performance events share the same event id in perf?

I am using Intel spr architecture, with a kernel version of 5.14 and a perf version of 4.18. I tried to analyze the meaning of LLC related events based on the method in this answer, but found that all events have the same ID: [ C(LL ) ] = { […

caching cpu-architecture intel perf intel-pmu

asked Apr 14 '23 at 14:47

Frontier_Setter

votes

1 answer

Why "setne %al" used "a lot of cycles" in perf annotation?

I was very confused when I saw this perf report. I have tried it for several times, and this setne instruction always takes the most in the function. The function is a big function and below just shows a small piece of the function. The report is…

assembly x86 intel perf intel-pmu

asked Sep 08 '20 at 08:02

Steven Ding

Prev 1 2

4 5 6 7 Next