2

I am doing some performance profiling for part of my program. And I try to measure the execution with the following four methods. Interestingly they show different results and I don't fully understand their differences. My CPU is Intel(R) Core(TM) i7-4770. System is Ubuntu 14.04. Thanks in advance for any explanation.

Method 1: Use the gettimeofday() function, result is in seconds

Method 2: Use the rdtsc instruction similar to https://stackoverflow.com/a/14019158/3721062

Method 3 and 4 exploits Intel's Performance Counter Monitor (PCM) API

Method 3: Use PCM's

uint64 getCycles(const CounterStateType & before, const CounterStateType &after)

Its description (I don't quite understand):

Computes the number core clock cycles when signal on a specific core is running (not halted)

Returns number of used cycles (halted cyles are not counted). The counter does not advance in the following conditions:

an ACPI C-state is other than C0 for normal operation
HLT
STPCLK+ pin is asserted
being throttled by TM1
during the frequency switching phase of a performance state transition
The performance counter for this event counts across performance state transitions using different core clock frequencies

Method 4: Use PCM's

uint64 getInvariantTSC (const CounterStateType & before, const CounterStateType & after)

Its description:

Computes number of invariant time stamp counter ticks.

This counter counts irrespectively of C-, P- or T-states

Two samples runs generate result as follows: (Method 1 is in seconds. Methods 2~4 are divided by a (same) number to show a per-item cost).

0.016489 0.533603 0.588103 4.15136 

0.020374 0.659265 0.730308 5.15672

Some observations:

  1. The ratio of Method 1 over Method 2 is very consistent, while the others are not. i.e., 0.016489/0.533603 = 0.020374/0.659265. Assuming gettimeofday() is sufficiently accurate, the rdtsc method exhibits the "invariant" property. (Yep I read from Internet that current generation of Intel CPU has this feature for rdtsc.)

  2. Methods 3 reports higher than Method 2. I guess its somehow different from the TSC. But what is it?

  3. Methods 4 is the most confusing one. It reports an order of magnitude larger number than Methods 2 and 3. Shouldn't it be also kind of cycle counts? Let alone it carries the "Invariant" name.

Community
  • 1
  • 1
Neo1989
  • 285
  • 3
  • 14

1 Answers1

1

gettimeofday() is not designed for measuring time intervals. Don't use it for that purpose.

If you need wall time intervals, use the POSIX monotonic clock. If you need CPU time spent by a particular process or thread, use the POSIX process time or thread time clocks. See man clock_gettime.

PCM API is great for fine tuned performance measurement when you know exactly what you are doing. Which is generally obtaining a variety of separate memory, core, cache, low-power, ... performance figures. Don't start messing with it if you are not sure what exact services you need from it that you can't get from clock_gettime.

n. m. could be an AI
  • 112,515
  • 14
  • 128
  • 243