Questions tagged [intel-pmu]

Questions related to the use of the Intel Performance Management Unit, which provides performance counters related to the performance of currently executing code.

The Intel performance management unit provides performance counters which track performance related metrics for the currently executing code.

They are useful while profiling code, and are supported by Intel's VTune, Linux's perf command and the Windows Performance Toolkit.

The counters and the details of how to program them vary by CPU architecture and the details are available in Chapter 18 and 19 of the Intel-64 and IA-32 Architectures Software Developer Manual, Volume 3.

Other libraries / tools for using the PMU include:

  • Likwid: Various performance-related tools, including a micro-benchmarking framework. Supports Intel-PMU, AMD perf counters, some ARM, POWER8/9, and some NVidia GPUs.

  • libpfc: A simple Linux kernel module and library to let user-space program the counters, so it can use rdpmc in user-space. Example usage in the author's answer to this SO question.

  • https://github.com/andikleen/pmu-tools some wrappers around Linux perf. ocperf.py used to be more useful, before perf itself got symbolic event names for more CPU-specific events. But there are other tools in that repo.

91 questions
1
vote
0 answers

how to reset general purpose performance counter of intel

I know we can use wrmsr and rdmsr instruction to set the performance counter and read the general purpose performance counter register. However, my question is: Do we need to reset the general purpose performance counter register before we issue…
Mike
  • 1,841
  • 2
  • 18
  • 34
1
vote
0 answers

Power counter on Intel processor or GPUs

Anyone has any experiences on power counters on Intel processors(intel performance counter management library) or GPUs, which type of CPUs and GPUs support such counters, how accurate are these counters? Do such counters needs special motherboard?
Lu Li
  • 11
  • 1
0
votes
0 answers

How does intel advisor measure L1, L2 and L3 bandwidths for loops and functions? Are there PMU events which count the bytes transferred?

I am using the intel advisor cache aware roofline feature and wanted to know how intel advisor measures the Core to L1 data cache bandwidth of the application. The application is run twice, once for collecting timing information for loops and…
sham1810
  • 173
  • 1
  • 11
0
votes
0 answers

Workload Memory Bandwidth Comparison Inconsistency

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor. In a relatively idle situation, I ran the following Perf commands for around 5 seconds. The counters are offcore_response.all_data_rd.l3_miss.local_dram and…
TheAhmad
  • 810
  • 1
  • 9
  • 21
0
votes
1 answer

Don't all loads result in an L1 cache hit (after the data arrives if it initially missed)?

It is quite obvious that the cache miss rate can be determined by the following formula: miss_rate = n_misses / n_accesses I have a doubt regarding how number of misses are counted. I think a cache miss (es. L1) is treated like this: miss in L1…
rrpp1045
  • 1
  • 2
0
votes
0 answers

macOS Instruments PMU counters overflow

When profiling code in macOS Instruments 14.1 I'm frequently getting negative PMU counters. Specifically on counters: CPU_CLK_UNHALTED.THREAD and INST_RETIRED.ANY. It seems to me like overflow. What are possible reasons for this and is there any…
Denis Bazhenov
  • 9,680
  • 8
  • 43
  • 65
0
votes
0 answers

CPU performance monitor counters cannot be read directly

I tried to read directly from the PMCs instead of using Perf or something like that. The code is shown below. The full and compilable code is archived here However, I failed. The 0x000000c0 should count the number of instructions retired. But I got…
moep0
  • 358
  • 1
  • 8
0
votes
1 answer

Perf Result Conflict During Multiplexing

I have an Intel(R) Core(TM) i7-4720HQ CPU @ 2.60GHz (Haswell) processor (Linux 4.15.0-20-generic kernel). In a relatively idle situation, I ran the following Perf commands and their outputs are shown, below. The counters are…
0
votes
0 answers

Measure load stalls using Intel performance monitoring counters for RESOURCE_STALLS

I am trying to understand meaning of various Intel performance monitoring counters and also want to measure load stalls using Intel performance monitoring counters available for RESOURCE_STALLS. The following are approx. per second values for all…
0
votes
1 answer

cpuid: reported micro-architecture seems ambiguous

Ubuntu 20.04 LTS. Note (unknown type) reported: $ cpuid | less CPU 0: vendor_id = "GenuineIntel" version information (1/eax): processor type = primary processor (0) family = 0x6 (6) model = 0xe (14) …
ecwdw 23e3e23e
  • 375
  • 4
  • 11
0
votes
1 answer

What does this sentence mean in the context of perf tool: "Supports address when precise (Precise event)"?

This line appears under memory events in perf tool. CPU: Intel Xeon Gold
0
votes
1 answer

Perf cannot use symbol from kernel module

I want to trace a kernel module I've written using Intel PT but I can not get perf to recognize symbols from my kernel modules. For the sake of simplicity, I tried tracing a module that periodically prints a string to the log, using perf record -e…
0
votes
1 answer

Vtune: Accuracy of Intel sampling drivers when vtune measurement run on a machine running other tasks

I have the latest coffeelake machine which is primarily used as a storage server. The average workload on each core (4 cores) is around 5-10% when running a storage server alone. I want to run vtune measurements of a workload on this machine using…
0
votes
0 answers

How does perf collect kernel space performance events?

What HW feature does perf use to collect performance monitoring event for ring 0 on Intel CPUs? My picture of the world is this: Looking for a free IA32_PERFEVTSELn MSR by asking IA32_PERF_GLOBAL_INUSE When taking a free IA32_PERFEVTSELn it sets…
Some Name
  • 8,555
  • 5
  • 27
  • 77
0
votes
2 answers

Usage of PERF_EVENT_IOC_PERIOD to change sampling period during runtime

I am using raspbian Linux 4.9.78-v7+ on a pi 3b. I am using perf to do some performance experiments. I am trying to use PERF_EVENT_IOC_PERIOD of perf to change the period during runtime of the process. I set the initial sampling period in the…