0

Two versions of the same algorithm yield different total instruction fetch counts and cycle estimations under valgrind/cachegrind. The difference is about 25%. Process timing, however, is very similar (it is actually shorter for the cachegrind-slow version):

  • version 1:

    Ir:     146,328,018,245
    CEst:   152,553,736,055
    timing: 17.93 s
    
  • version 2:

    Ir:     185,221,836,610
    CEst:   197,531,381,950
    timing: 17.53 s
    

Is this behaviour expected? How can I learn more about why version 1 is slower?

Arek' Fu
  • 826
  • 8
  • 24
  • Are you measuring the time of the cachegrind run, or of a "real" (you know what I mean) run? –  Oct 13 '12 at 14:03
  • @delnan, I'm measuring the real execution time using the `time` `bash` builtin. – Arek' Fu Oct 13 '12 at 14:04

1 Answers1

0

I discovered that the inconsistency is due to the different compiler options used for the cachegrind runs and for the timing runs. In particular, I had disabled function inlining for the cachegrind runs (so that I could get meaningful per-function counts).

Arek' Fu
  • 826
  • 8
  • 24