Massive time loss in simple for loop

Question

I have an extremely simple program to measure how much time a function is taking.

#include <iostream>
#include <vector>
#include <chrono>
struct Foo
{
    void addSample(uint64_t s)
    {
    }
};

void test(const std::vector<uint64_t>& samples)
{
    uint32_t onlyCallTime = 0;
    uint32_t loopOnlyTime = 0;
    Foo stats;
    std::chrono::high_resolution_clock::time_point callStart,callEnd;
    auto start = callStart = callEnd = std::chrono::high_resolution_clock::now();
    for(auto &s : samples)
    {
        callStart = std::chrono::high_resolution_clock::now();
        loopOnlyTime += std::chrono::duration_cast<std::chrono::microseconds>(callStart-callEnd).count();
        stats.addSample(s);
        callEnd = std::chrono::high_resolution_clock::now();
        onlyCallTime += std::chrono::duration_cast<std::chrono::microseconds>(callEnd-callStart).count();
    }
    auto end = std::chrono::high_resolution_clock::now();

    std::cout << "overall duration: " << std::chrono::duration_cast<std::chrono::microseconds>(end-start).count() << std::endl;
    std::cout << "only call duration: " << onlyCallTime << std::endl;
    std::cout << "only loop duration: " << loopOnlyTime << std::endl;
}

int main()
{
    std::vector<uint64_t> dataSetDecreasing;
    for(uint32_t i = 0; i < 1000000; ++i)
        dataSetDecreasing.push_back(1000000-i);
    test(dataSetDecreasing);
}

The output is really confusing. Here are some examples:

overall duration: 56047
only call duration: 195
only loop duration: 285

overall duration: 40984
only call duration: 177
only loop duration: 243

overall duration: 47328
only call duration: 187
only loop duration: 177

How I see it is that callEnd-callStart captures the call to addSample + the duration_cast.
callStart-callEnd captures everything else, so the loop initialization, iteration, condition and the second duration_cast. What am I missing? Where are the other ~40000 microseconds going?

Compiled with g++ -Wall -Wextra -std=c++17 -O3
g++ (GCC) 10.2.1 20200723 (Red Hat 10.2.1-1)
My OS is Fedora release 32 (Thirty Two)

How are you compiling this (compiler version and options), and what OS and hardware? Could lazy dynamic linking account for the extra overhead in the outer timed region? Page faults if you're touching new memory? No, you're touching memory before the timed region. Or maybe some kind of sample granularity issue, I didn't look at the code very carefully. What's the overall time for the whole process, like with `perf stat` to report it about as precisely as the OS measures it. — Peter Cordes, Sep 08 '20 at 17:51
Niote: Timing `addSample` is pointless. At `O3` compiler should be optimizing it out. You will have to add some observable effect for the function to make it to the executable. — user4581301, Sep 08 '20 at 17:56
@UlrickEckhardt either way the individual times should add up to the total. The lines measured are (almost) identical — 463035818_is_not_an_ai, Sep 08 '20 at 17:58

score 12 · Accepted Answer · edited Sep 08 '20 at 20:35

12

1 000 000 iterations took roughly 50 000 microseconds. That is well under 1μs per iteration on average and std::chrono::duration_cast<std::chrono::microseconds> will round any time less than 1μs to 0. Meaning your loop only counts the iterations that took way longer than average for some reason (scheduling, page faults, caches probably).

Since each measurement has a basic error independent of the duration measured (plus other errors), taking many small measurements and adding them up will be so much less precise than measuring the whole duration once.

edited Sep 08 '20 at 20:35

Peter Cordes

328,167
45
605
847

answered Sep 08 '20 at 17:59

Quimby

17,735
4
35
55

2

Good catch. Running it with nanoseconds instead adds up almost exactly – A.Hristov Sep 08 '20 at 18:03
1

@JeJo: 1μs isn't code, it's a real interval of time. Code-formatting is not necessary or even appropriate. We're not talking about those symbols appearing as part of a program. Physical quantities in general like 1 cm aren't code and shouldn't be code-formatted. Using C++ number formatting for the iteration count is optional but not a real improvement. (I commented in such detail for the benefit of your future edits, and to explain my rollback, not to call you out.) – Peter Cordes Sep 08 '20 at 20:35

Massive time loss in simple for loop

1 Answers1