I am trying to use perfsuite (which internally uses papi) to measure some performance counter around a function. This function spawns one thread per core. The problem is, if I start the counters before the function call and stop them after the call, I get incorrect values for those counters. But if the function doesn't create any threads, it gets the right values.
I know psrun can get counters for all cores for an executable. But I want the same feature for a function call, not an executable.
I am using perfsuite 1.1.1 with papi 4.4.0 from C on Debian.