2

I've made few algorithm implementations with various micro-optimalizations. I need to count number of executed instructions of a call, or between two places (before and after call).

Algorithm uses few cycles and conditional jumps, and it's data sensible. So I can't just use calculated number of instructions per cycle iteration, and multiply it with count of iterations.

Disclaimer: I know that number of executed instructions ain't much relevant, because performance for same instructions varies with different CPUs, but it's for demonstration purpose only.

kravemir
  • 10,636
  • 17
  • 64
  • 111
  • Have you looked into valgrind? There is an option to get all instruction counts. Maybe there is a way to limit the scope. http://valgrind.org/docs/manual/lk-manual.html – alnet Nov 19 '15 at 11:30
  • How do you count? Do prefixes count as individual instructions? What about string instructions like `rep movsb`? Do these count as one or once for every iteration? – fuz Nov 19 '15 at 11:38
  • Have you tried to turn the asm option on, allowing the assembler source files to be produced while building the project. This would at least give you the source needed for you to do your compare. – Neil Nov 19 '15 at 11:57
  • I am not sure if you can actually count the number of instructions but if you don't know it take a look at perf (http://sandsoftwaresound.net/perf/perf-tutorial-hot-spots/) – terence hill Nov 19 '15 at 12:01
  • That's not a very good measure for how much effective work was done on a modern processor core that support speculative execution. They all do. ISA cores have a counter that reports the number of *retired* instructions, much better measure. Don't invent this wheel, any decent profiler gives you access to this. – Hans Passant Nov 19 '15 at 13:03
  • I also find this post that seems very related: http://stackoverflow.com/questions/16312270/how-to-measure-number-of-executed-assembler-instructions – terence hill Nov 19 '15 at 15:14

1 Answers1

0

On x86 (both 32- and 64-bit) you are probably looking for the RDTSC instruction. Given how complex modern CPUs are, any form of simulation or static analysis certainly isn't.

Your compiler may or may not have an intrinsic for it, if not, do something like this: (GCC syntax for the inline asm,)

uint64_t GetTSC(void)
{
  uint64_t h, l;

  h = l = 0;
  __asm__("rdtsc" : "=a"(l), "=d"(h));
  h <<= 32;
  h |= l;

  return h;

}

With the caveats described in https://en.wikipedia.org/wiki/Time_Stamp_Counter