Questions tagged [intel-pmu]

Questions related to the use of the Intel Performance Management Unit, which provides performance counters related to the performance of currently executing code.

The Intel performance management unit provides performance counters which track performance related metrics for the currently executing code.

They are useful while profiling code, and are supported by Intel's VTune, Linux's perf command and the Windows Performance Toolkit.

The counters and the details of how to program them vary by CPU architecture and the details are available in Chapter 18 and 19 of the Intel-64 and IA-32 Architectures Software Developer Manual, Volume 3.

Other libraries / tools for using the PMU include:

  • Likwid: Various performance-related tools, including a micro-benchmarking framework. Supports Intel-PMU, AMD perf counters, some ARM, POWER8/9, and some NVidia GPUs.

  • libpfc: A simple Linux kernel module and library to let user-space program the counters, so it can use rdpmc in user-space. Example usage in the author's answer to this SO question.

  • https://github.com/andikleen/pmu-tools some wrappers around Linux perf. ocperf.py used to be more useful, before perf itself got symbolic event names for more CPU-specific events. But there are other tools in that repo.

91 questions
5
votes
1 answer

PERF_TYPE_HARDWARE and PERF_TYPE_HW_CACHE concurrent monitoring

I'm working on a custom implementation on top of perf_event_open syscall. The implementation aims to support various of PERF_TYPE_HARDWARE, PERF_TYPE_SOFTWARE and PERF_TYPE_HW_CACHE events for specific threads on any core. In Intel® 64 and IA-32…
Orion Papadakis
  • 398
  • 1
  • 14
5
votes
2 answers

Is it possible for the RESOURCE_STALLS.RS event to occur even when the RS is not completely full?

The description of the RESOURCE_STALLS.RS hardware performance event for Intel Broadwell is the following: This event counts stall cycles caused by absence of eligible entries in the reservation station (RS). This may result from RS overflow, or …
Hadi Brais
  • 22,259
  • 3
  • 54
  • 95
5
votes
2 answers

How to read PMC (Performance Monitoring Counter) of Intel processor?

I'm trying to read PMC (Performance Monitoring Counter) by using RDMSR and WRMSR instructions. In my Linux desktop which has Intel i7 6700 CPU (Skylake), I wrote a simple driver code: static int my_init(void) { unsigned int msr; u64 low,…
nickeys
  • 137
  • 2
  • 10
5
votes
4 answers

Hardware Performance counter on Intel Core Duo

I have read that there are AMD processors out there that allow you to measure the number of cache hits and misses. I am wondering if also such a feature is available on Intel Core Duo machines or if they do not support this yet.
Alex12
  • 81
  • 3
4
votes
1 answer

PMU x86-64 performance counters not showing in perf under AWS

I am running a C++ benchmark test for a specific application. In this test, I open the performance counter file (__NR_perf_event_open syscall) before the critical section, proceed with the section and then after read the specified metric…
user8143588
4
votes
1 answer

PMC to count if software prefetch hit L1 cache

I am trying to find a PMC (Performance Monitoring Counter) that will display the amount of times that a prefetcht0 instruction hits L1 dcache (or misses). icelake-client: Intel(R) Core(TM) i7-1065G7 CPU @ 1.30GHz I am trying to make this fine grain…
Noah
  • 1,647
  • 1
  • 9
  • 18
4
votes
1 answer

How to read PMC(Performance Monitoring Counter) of x86 intel processor

My desktop is Intel x86_64 processor with Ubuntu operating system. I know there is perf tool to get a list of statistics of a program. But what I am trying to do is read performance counter directly without using the perf tool. First…
rhdhyekw93
  • 43
  • 1
  • 4
4
votes
2 answers

Why do newer Intel CPUs not suppert performance counter for stalled-cycles-backend?

I'm fighting memory latency using memory prefetching. Some (older) CPUs from Intel support performance counters for counting the cycles a CPU wasted with waiting for memory (stalled-cycles-backend), e.g. Intels E5-2690. On newer CPUs (Gold 6230 and…
jagemue
  • 363
  • 4
  • 16
4
votes
1 answer

How can I read performance counters from the kernel?

I have been using the Linux perf tool in the user space. I want to write code that reads performance counters for a thread every time it does a context switch. The steps required are: 1) Get a mechanism to read the performance counter registers. 2)…
4
votes
1 answer

How does perf use the offcore events?

Some built-in perf events are mapped to offcore events. For example, LLC-loads and LLC-load-misses are mapped to OFFCORE_RESPONSE. events. This can be easily determined as discussed in here. However, these offcore events require writing certain…
Hadi Brais
  • 22,259
  • 3
  • 54
  • 95
4
votes
0 answers

Get the performance monitoring interrupt on Qemu-Kvm

I have a situation with catching the performance monitoring interrupt (PMI - especially instruction counter) on qemu-kvm. The code below works fine on real machine (Intel Core TM i5-4300U) but on qemu-kvm (qemu-system-x86_64 -cpu host), I do not see…
Mahouk
  • 902
  • 9
  • 28
4
votes
2 answers

Perf tool stat output: multiplex and scaling of "cycles"

I am trying to understand the multiplex and scaling of "cycles" event in the "perf" output. The following is the output of perf tool: 144094.487583 task-clock (msec) # 1.017 CPUs utilized 539912613776 instructions …
Kailash Akilesh
  • 173
  • 2
  • 2
  • 7
4
votes
3 answers

How can we know the exact number of the hardware performance counters built-in CPU?

After I have done several reading on Hardware Performance Counter, I can claim that all of the Intel processors have supported with Hardware Performance Counter. So, In order to access these additional hardware registers ,i.e. hardware performance…
M.Mrd
  • 41
  • 2
4
votes
1 answer

How to Configure and Sample Intel Performance Counters In-Process

In a nutshell, I'm trying to achieve the following inside a userland benchmark process (pseudo-code, assuming x86_64 and a UNIX system): results[] = ... for (iteration = 0; iteration < num_iterations; iteration++) { pctr_start = sample_pctr(); …
Edd Barrett
  • 3,425
  • 2
  • 29
  • 48
3
votes
1 answer

Is it possible to sample LOAD and STORE instructions at the same time in Intel PEBS sampling?

I am trying to use the Intel PMU performance monitoring (PEBS) to sample all LOAD and STORE operations in a C/C++ application binary. The codebase I am using uses perf_event_open() to set up the monitoring for either LOAD or STORE in the…