Questions tagged [intel-pmu]

Questions related to the use of the Intel Performance Management Unit, which provides performance counters related to the performance of currently executing code.

The Intel performance management unit provides performance counters which track performance related metrics for the currently executing code.

They are useful while profiling code, and are supported by Intel's VTune, Linux's perf command and the Windows Performance Toolkit.

The counters and the details of how to program them vary by CPU architecture and the details are available in Chapter 18 and 19 of the Intel-64 and IA-32 Architectures Software Developer Manual, Volume 3.

Other libraries / tools for using the PMU include:

  • Likwid: Various performance-related tools, including a micro-benchmarking framework. Supports Intel-PMU, AMD perf counters, some ARM, POWER8/9, and some NVidia GPUs.

  • libpfc: A simple Linux kernel module and library to let user-space program the counters, so it can use rdpmc in user-space. Example usage in the author's answer to this SO question.

  • https://github.com/andikleen/pmu-tools some wrappers around Linux perf. ocperf.py used to be more useful, before perf itself got symbolic event names for more CPU-specific events. But there are other tools in that repo.

91 questions
1
vote
0 answers

`SIGSEGV` when reading `HW_CPU_CYCLES` on Alder Lake efficiency cores

I want to read the PERF_TYPE_HARDWARE + PERF_COUNT_HW_CPU_CYCLES on the Intel 12Gen. This is my test program (based on cpucycles/amd64rdpmc.c from SUPERCOP) : #include #include #include #include…
Joel
  • 1,725
  • 3
  • 16
  • 34
1
vote
0 answers

Are the event ratios in Appendix B.8 of Intel's Optimization Reference Manual applicable to other microarchitectures?

Appendix B.8 of the Intel 64 and IA-32 Architectures Optimization Reference Manual (June 2021) contain useful event ratios for performance analysis, workload characterization, and performance tuning. However, the section title EVENT RATIOS FOR INTEL…
1
vote
0 answers

How is PMU shared in Linux on X86?

I am using Linux 5.8.18 to do performance tuning, then I hit a confusion. PMU in X86 is limited resource, and perf is the tool to use the PMU to complete profiling/sampling. IIRC, perf document says the PMU resource is being shared by different…
wangt13
  • 959
  • 7
  • 17
1
vote
1 answer

Performance Counters and IMC Counter Not Matching

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. In a relatively idle situation, I ran the following Perf commands and their outputs are shown, below. The counters are offcore_response.all_data_rd.l3_miss.any_response and…
1
vote
1 answer

Let perf use certain performance counters properly with newer processors

I'm trying to use perf to measure certain events, including L1-dcache-stores, on my machine, which has a relatively new processor i9-10900K compared to the relatively old CentOS 7 with kernel 3.10.0-1127 The problem is that perf reports that…
Joshua Chia
  • 1,760
  • 2
  • 16
  • 27
1
vote
0 answers

Performance difference of two similar assembly instructions in Visual Studio CPU Usage

I have some inline assembly which I try to profile. Interestingly two very similar operations maxss and minss right after each other have a very different performance impact. Does anybody have experience with this? Perhaps it is some caching? Or the…
1
vote
1 answer

Why do kill dependency instructions consume reservation station slots?

I always thought that instructions for killing dependencies, e.g xor reg, reg do not have to be executed and are ready for retirement as soon as the Renamer moves them to the Re-order Buffer. I just measure the number of microoperations getting into…
Some Name
  • 8,555
  • 5
  • 27
  • 77
1
vote
1 answer

How to measure the dtlb hits and dtlb misses with perf_event_open()?

I want to measure the cache miss rate and dtlb miss rate. I have done the first part. But I can't find how to set the config to get dtlb miss and dtlb hits. When I measured the cache miss, I do like this: pe.type = PERF_TYPE_HARDWARE; …
zhujiaxin
  • 11
  • 2
1
vote
0 answers

Reading from mmap shared memory when using perf in sampling mode

I have a parent process that forks a child which is profiled using perf in sampling mode (sample every N events). Approximately 10000 samples are being generated. I know by using mmap() we can access the shared memory where samples are stored. But…
1
vote
0 answers

Variable event count based sampling using perf

I am trying to read the PMU event counters whenever a particular event counter overflows using perf. I know that perf works with fixed sample period. What i am looking for is the possibility to read PMU counters each time with a different sample…
1
vote
0 answers

Why do mov reg,reg instructions reading the result of a load account for so many cycles with perf record?

I'm profiling my program in Linux using perf tool, when checking the report I found a place really confuse me. I attach few lines of the report below: 0.94 : 451ab5: mov (%r15),%r8 0.44 : 451ab8: mov …
2power10
  • 1,259
  • 1
  • 11
  • 33
1
vote
1 answer

Reading performance registers from the kernel

I want to read certain performance counters. I know that there are tools like perf, that can do it for me in the user space itself, I want the code to be inside the Linux kernel. I want to write a mechanism to monitor performance counters on…
1
vote
0 answers

Using PEBS and Linux Perf to Count the number of CPU cycles passed to execute X number of instructions

I want to do something like this: After 100 million instructions have passed, query the Linux perf HW CPU cycles and record it in a file. I want to use this code to characterize the performance of applications/benchmark programs during different…
1
vote
1 answer

Performance Monitoring Counter (RDPMC) on a specific processor

I'm trying to use RDPMC Instruction for counting retired instructions and as Intel Software Developer's Manual Volume 3, Appendix A (In PERFORMANCE MONITORING section) mentioned: • Instructions Retired — Event select C0H, Umask 00H This event…
Embrace
  • 23
  • 6
1
vote
1 answer

Intel PEBS sample context

I am using Linux perf tool to monitor system-wide (exclude_kernel == 0) PEBS samples. I was wondering whether PEBS sample can occur at interrupt context (i.e., during an interrupt is being served by the interrupt handler). If it is possible, is…
Proy
  • 336
  • 2
  • 13