I used std::chrono::high_resolution_clock to measure std::lower_bound execution time. Here is my test code:
#include <iostream>
#include <algorithm>
#include <chrono>
#include <random>
const long SIZE = 1000000;
using namespace std::chrono;
using namespace std;
int sum_stl(const std::vector<double>& array, const std::vector<double>& vals)
{
long temp;
auto t0 = high_resolution_clock::now();
for(const auto& val : vals) {
temp += lower_bound(array.begin(),array.end(),val) - array.begin();
}
auto t1 = high_resolution_clock::now();
cout << duration_cast<duration<double>>(t1-t0).count()/vals.size()
<< endl;
return temp;
}
int main() {
const int N = 1000;
vector<double> array(N);
auto&& seed = high_resolution_clock::now().time_since_epoch().count();
mt19937 rng(move(seed));
uniform_real_distribution<float> r_dist(0.f,1.f);
generate(array.begin(),array.end(),[&](){return r_dist(rng);});
sort(array.begin(), array.end());
vector<double> vals;
for(int i = 0; i < SIZE; ++i) {
vals.push_back(r_dist(rng));
}
int index = sum_stl(array, vals);
return 0;
}
array
is a sorted vector with 1000 uniformed random numbers. vals
has a size of 1 million. At first I set the timer inside the loop to measure every single std::lower_bound
execution, timing result was around 1.4e-7 seconds. Then I tested other operations like +
, -
, sqrt
, exp
, but they all gave the same result as std::lower_bound
.
In a former topic resolution of std::chrono::high_resolution_clock doesn't correspond to measurements, it's said that the 'chrono' resolution might be not enough to represent a duration less than 100 nanoseconds. So I set a timer for the whole loop and get an average by dividing the iteration number. Here is the output:
1.343e-14
There must be something wrong since it gave a duration even less than the CPU cycle time, but I just can't figure it out.
To make the question more general, how can I measure accurate execution time for a short function?