0

I tried to read directly from the PMCs instead of using Perf or something like that. The code is shown below. The full and compilable code is archived here

However, I failed. The 0x000000c0 should count the number of instructions retired. But I got a zero in the output.

Is there any error in my test and code?

#include <stdint.h>
#include <stdio.h>
#include "pmc.h"
#include <fstream>
#include <iostream>
using namespace std;
counter programmables[] =
{
  { 0x000000c0ull, "INSTRUCTION_RETIRED"},
};
int core = 0;
static void pin_cpu(size_t core_ID)
{
  cpu_set_t set;
  CPU_ZERO(&set);
  CPU_SET(core_ID, &set);
  if (sched_setaffinity(0, sizeof(cpu_set_t), &set) < 0) {
    printf("Unable to Set Affinity\n");
    exit(EXIT_FAILURE);
  }
}
int main(){
    pin_cpu(core);
    int result = 0;
    int n = 10000;
    double cnt = 0;
    srand(10769);
    size_t n_programmables = sizeof(programmables) / sizeof(programmables[0]);
    setup_pmc(core, programmables, n_programmables);
    zero_pmc(n_programmables);
    start_pmc();
    double a = 0;
    for(int i = 1;i <= n;i++){
        a = (double)(rand() + 300 + i)/(rand() - i);
        cnt += a;
    }
    stop_pmc();
    result = get_stats_single();
    printf("%lf %d\n",cnt,result);
}
moep0
  • 358
  • 1
  • 8
  • 1
    You didn't call `start_pmc`. – prl Mar 13 '23 at 09:22
  • What CPU are you using? I assume some recent Intel? – Peter Cordes Mar 13 '23 at 10:51
  • @prl Thanks for pointing it out. I have missed it. Now I add it and then test, but I get the same output. – moep0 Mar 14 '23 at 01:14
  • @PeterCordes Yes. The code is tested on i5-8265u and i7-10700. The results are the same. – moep0 Mar 14 '23 at 01:14
  • 1
    I have tried some other PMCs according to [the Intel guide](https://cdrdv2-public.intel.com/671378/335279-performance-monitoring-events-guide.pdf). They all work but around 30% less than perf results. I don't know why the difference appears and why the `c0` doesn't work. – moep0 Mar 14 '23 at 02:52
  • There's a "fixed" counter for `inst_retired.any`; it doesn't have to take one of the four (or eight without hyperthreading) programmable counters per physical core. IDK if that's significant. As for counting less, I wonder if counting kernel instructions via `perf` could account for the 30%, if there's that many more counts inside system-call and interrupt-handler code, if you aren't using `perf stat --all-user`. Or just from `perf stat` counting the CRT startup code of your whole program that runs before `main`. – Peter Cordes Mar 14 '23 at 15:21
  • @PeterCordes Thanks for the reply. Now I get it. Besides, I don't enable `--all-user`. Therefore, I think it is due to the code before `main` – moep0 Mar 15 '23 at 07:21

0 Answers0