Questions tagged [intel-pmu]

Questions related to the use of the Intel Performance Management Unit, which provides performance counters related to the performance of currently executing code.

The Intel performance management unit provides performance counters which track performance related metrics for the currently executing code.

They are useful while profiling code, and are supported by Intel's VTune, Linux's perf command and the Windows Performance Toolkit.

The counters and the details of how to program them vary by CPU architecture and the details are available in Chapter 18 and 19 of the Intel-64 and IA-32 Architectures Software Developer Manual, Volume 3.

Other libraries / tools for using the PMU include:

  • Likwid: Various performance-related tools, including a micro-benchmarking framework. Supports Intel-PMU, AMD perf counters, some ARM, POWER8/9, and some NVidia GPUs.

  • libpfc: A simple Linux kernel module and library to let user-space program the counters, so it can use rdpmc in user-space. Example usage in the author's answer to this SO question.

  • https://github.com/andikleen/pmu-tools some wrappers around Linux perf. ocperf.py used to be more useful, before perf itself got symbolic event names for more CPU-specific events. But there are other tools in that repo.

91 questions
2
votes
1 answer

only 2 PERF_TYPE_HW_CACHE events in perf event group

Working on a custom implementation on top of perf_event_open I need to monitor multiple PERF_TYPE_HW_CACHE concurrently. The Intel manual states that there are 4 programmable counters per thread (or 8 if HyperThreading is disabled) for my CPU's…
Orion Papadakis
  • 398
  • 1
  • 14
2
votes
1 answer

Best event counter to use for measuring wall clock time using perf tools

Simple but yet complicated question: What counter to use to get perf tools to measure wall clock time? As a base line the first thing when profiling code I think I need to measure is just wall clock time to get an first idea where the code takes…
Peter
  • 785
  • 2
  • 7
  • 18
2
votes
0 answers

trouble with pmi handle on windows 7

I am trying to set up performance monitorint interrupt on counter overflow to collect some information. For this I created driver. I skip some part of code that are irrelevant. driver.c extern VOID EnableReadPmc(); extern VOID PmiHandle(); extern…
2
votes
0 answers

Determine L1 fill buffer occupancy related to stores on Intel

To determine the L1D fill buffer occupancy related to loads, one can use the L1D_PEND_MISS events, in particular L1D_PEND_MISS.PENDING, which is documented as follows: Counts duration of L1D miss outstanding, that is each cycle number of Fill…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
2
votes
0 answers

How to narrow down intel PCM data to a single process?

I'm trying to use Intel Performance Counter Monitor (PCM) to understand L3 cache miss and some other performance criteria in my code. I'm not sure how to make sense out of the numbers I'm getting and would appreciate some insight. I expect ideally…
Amir
  • 421
  • 1
  • 4
  • 14
2
votes
0 answers

value of PMC (Performance Monitoring Counter) for L3 cache-misses is too high

I'm searching a way to estimate the number of L3 cache-misses by using 'IA32_PERFEVTSELx' and 'IA32_PMCx' MSR pair on my Linux PC with Intel CPU (Intel i7 6700 skylake). To do that, I installed a timer in the kernel and it reported the value of a…
nickeys
  • 137
  • 2
  • 10
2
votes
0 answers

Intel PMU event for L1 cache hit event

I'm trying to count the number of cache hit at different levels (L1, L2 and L3) of cache for a program on Intel Haswell processor. I wrote a program to count the number of L2 and L3 cache hits by monitoring the respective events. To achieve that, I…
Mike
  • 1,841
  • 2
  • 18
  • 34
2
votes
1 answer

How to measure late prefetches and killed prefetches on Haswell micro-architecture?

I am using Intel Xeon 2660 v3 and issuing lots of software prefetches to exploit the MLP as well as to reduce the stall time. Now I want to profile the application to get the overall gain due to software prefetches. In the paper "Improving the…
A-B
  • 487
  • 2
  • 23
2
votes
1 answer

Reading performance counters for Intel Xeon in userspace

I want to read performance counters for intel xeon using a shell script in userspace. Oprofile will not work as it is too rigid to fulfill my requirements. I am using FC13. Thanks
ahmed
  • 21
  • 2
2
votes
0 answers

Determine fixed counter to event mapping with libpfm4

I'm using libpfm4 to determine Intel performance monitor counter encodings (e.g., to map between a human-readable name and the encoding). Intel PMUs have a number of "fixed counters" which can be enabled or disabled, but when enabled always count…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
2
votes
3 answers

Intel Performance Monitor -- any way to monitor per-process?

How would I go about monitoring a particular process's execution (namely, its branches, from the Branch Trace Store) using the Intel Performance Counter monitor, while filtering out other process's information?
user541686
  • 205,094
  • 128
  • 528
  • 886
1
vote
0 answers

Measuring ITLB_FLUSH on icelake processors

According to the Intel website for performance counters at https://perfmon-events.intel.com/, there are counters specifically for ITLB.ITLB_FLUSH for processors based on the "skylake" microarchitecture (e.g. Skylake client, Cascade Lake-X…
CH_skar
  • 103
  • 5
1
vote
0 answers

What are the complete sources of L3 misses which aren't counted by the cache-miss event on Skylake?

When I was trying to understand the cache-miss event of perf on Intel machines, I noticed the following description: "PublicDescription": "Counts core-originated cacheable requests that miss the L3 cache (Longest Latency cache). Requests include…
1
vote
1 answer

Is there a counter in modern x86 CPUs which only counts the time (or cycles) spent in interrupt handlers?

This is not a duplicate question. It has been claimed that this question is a duplicate of this one. However, I didn't mention "Linux" or "Kernel" (neither in the tags nor in the text). Hence, claiming this being a duplicate of a question which…
Binarus
  • 4,005
  • 3
  • 25
  • 41
1
vote
0 answers

How to use rdpmc instruction on AMD (EPYC) processor?

This program displays the count of actual CPU core cycles executed by the current core (using the related PMC which I believe is UNHALTED_CORE_CYCLES) #include #include int main(int argc, char* argv[]){ unsigned long a, d,…
PierreJ
  • 23
  • 4