5

I have an extremely simple program to measure how much time a function is taking.

#include <iostream>
#include <vector>
#include <chrono>
struct Foo
{
    void addSample(uint64_t s)
    {
    }
};

void test(const std::vector<uint64_t>& samples)
{
    uint32_t onlyCallTime = 0;
    uint32_t loopOnlyTime = 0;
    Foo stats;
    std::chrono::high_resolution_clock::time_point callStart,callEnd;
    auto start = callStart = callEnd = std::chrono::high_resolution_clock::now();
    for(auto &s : samples)
    {
        callStart = std::chrono::high_resolution_clock::now();
        loopOnlyTime += std::chrono::duration_cast<std::chrono::microseconds>(callStart-callEnd).count();
        stats.addSample(s);
        callEnd = std::chrono::high_resolution_clock::now();
        onlyCallTime += std::chrono::duration_cast<std::chrono::microseconds>(callEnd-callStart).count();
    }
    auto end = std::chrono::high_resolution_clock::now();

    std::cout << "overall duration: " << std::chrono::duration_cast<std::chrono::microseconds>(end-start).count() << std::endl;
    std::cout << "only call duration: " << onlyCallTime << std::endl;
    std::cout << "only loop duration: " << loopOnlyTime << std::endl;
}

int main()
{
    std::vector<uint64_t> dataSetDecreasing;
    for(uint32_t i = 0; i < 1000000; ++i)
        dataSetDecreasing.push_back(1000000-i);
    test(dataSetDecreasing);
}

The output is really confusing. Here are some examples:

overall duration: 56047
only call duration: 195
only loop duration: 285
overall duration: 40984
only call duration: 177
only loop duration: 243
overall duration: 47328
only call duration: 187
only loop duration: 177

How I see it is that callEnd-callStart captures the call to addSample + the duration_cast.
callStart-callEnd captures everything else, so the loop initialization, iteration, condition and the second duration_cast. What am I missing? Where are the other ~40000 microseconds going?

Compiled with g++ -Wall -Wextra -std=c++17 -O3
g++ (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)
My OS is Fedora release 32 (Thirty Two)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
A.Hristov
  • 481
  • 1
  • 4
  • 13
  • 2
    How are you compiling this (compiler version and options), and what OS and hardware? Could lazy dynamic linking account for the extra overhead in the outer timed region? Page faults if you're touching new memory? No, you're touching memory before the timed region. Or maybe some kind of sample granularity issue, I didn't look at the code very carefully. What's the overall time for the whole process, like with `perf stat` to report it about as precisely as the OS measures it. – Peter Cordes Sep 08 '20 at 17:51
  • 2
    You forgot to measure the duration of the chrono calls. – Ulrich Eckhardt Sep 08 '20 at 17:54
  • @PeterCordes Added the compiler, options and OS. – A.Hristov Sep 08 '20 at 17:56
  • Niote: Timing `addSample` is pointless. At `O3` compiler should be optimizing it out. You will have to add some observable effect for the function to make it to the executable. – user4581301 Sep 08 '20 at 17:56
  • 2
    @UlrickEckhardt either way the individual times should add up to the total. The lines measured are (almost) identical – 463035818_is_not_an_ai Sep 08 '20 at 17:58

1 Answers1

12

1 000 000 iterations took roughly 50 000 microseconds. That is well under 1μs per iteration on average and std::chrono::duration_cast<std::chrono::microseconds> will round any time less than 1μs to 0. Meaning your loop only counts the iterations that took way longer than average for some reason (scheduling, page faults, caches probably).

Since each measurement has a basic error independent of the duration measured (plus other errors), taking many small measurements and adding them up will be so much less precise than measuring the whole duration once.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Quimby
  • 17,735
  • 4
  • 35
  • 55
  • 2
    Good catch. Running it with nanoseconds instead adds up almost exactly – A.Hristov Sep 08 '20 at 18:03
  • 1
    @JeJo: 1μs isn't code, it's a real interval of time. Code-formatting is not necessary or even appropriate. We're not talking about those symbols appearing as part of a program. Physical quantities in general like 1 cm aren't code and shouldn't be code-formatted. Using C++ number formatting for the iteration count is optional but not a real improvement. (I commented in such detail for the benefit of your future edits, and to explain my rollback, not to call you out.) – Peter Cordes Sep 08 '20 at 20:35