I was trying to familiarize myself with the google benchmark framework, and decided to run a test with the famous pre/post increments. However, I found out that within the execution of the same function, it is literally the same code, I get different results in terms of time measurements.
My test consists of three functions:
incrementA
, just a for-loop with nothing specialincrementB
which is a copy ofincrementA
increment
that callsincrementA
With these three functions, I wrote a fixture and then registered the tests.
#include <assert.h>
#include <stdint.h>
#include <benchmark/benchmark.h>
//---------------------------------------------------------------------
void incrementA(int COUNT) {
volatile int a[COUNT+1];
int i = 0;
for (int j = 0; j < 1000; j++) {
i = 0;
for (int k = 0; k < COUNT; k++) {
a[i++] = k + j;
}
}
}
void incrementB(int COUNT) {
volatile int a[COUNT+1];
int i = 0;
for (int j = 0; j < 1000; j++) {
i = 0;
for (int k = 0; k < COUNT; k++) {
a[i++] = k + j;
}
}
}
void increment(int COUNT) {
incrementA(COUNT);
}
//---------------------------------------------------------------------
class PrePostIncrement : public ::benchmark::Fixture
{
public:
void SetUp(const ::benchmark::State& st)
{
size = st.range(0);
}
void TearDown(const ::benchmark::State&)
{
}
static void CustomArguments(benchmark::internal::Benchmark* b)
{
size_t minSize = 8;
for (int i = 0; (1 << (i + minSize)) < (1 << 20); ++i)
b->Arg(1 << (i + minSize));
}
int size;
};
//---------------------------------------------------------------------
#define REGISTER_TEST(IncrementFunction) \
using IncrementFunction##_Test = PrePostIncrement; \
BENCHMARK_DEFINE_F(IncrementFunction##_Test, Obj)(benchmark::State& state) \
{ \
while (state.KeepRunning()) \
{ \
IncrementFunction(size); \
} \
} \
BENCHMARK_REGISTER_F(IncrementFunction##_Test, Obj)->Apply(IncrementFunction##_Test::CustomArguments)->Unit(benchmark::kMillisecond);
REGISTER_TEST(incrementA);
REGISTER_TEST(incrementB);
REGISTER_TEST(increment);
BENCHMARK_MAIN();
Compiled with:
$ g++ increment_benchmark.cpp -std=gnu++14 -march=native -pthread -O3 -I/home/user/software/benchmark/include -L/home/user/software/benchmark/build/src -Wl,-rpath=/home/user/software/benchmark/build/src -lbenchmark
and the results are inconsistent, e.g. by swapping the order of the tests, I get different results.
---------------------------------------------------------------------
Benchmark Time CPU Iterations
---------------------------------------------------------------------
incrementA_Test/Obj/256 0.125 ms 0.125 ms 5499
incrementA_Test/Obj/512 0.244 ms 0.244 ms 2868
incrementA_Test/Obj/1024 0.482 ms 0.482 ms 1439
incrementA_Test/Obj/2048 0.971 ms 0.971 ms 715
incrementA_Test/Obj/4096 1.91 ms 1.91 ms 361
incrementA_Test/Obj/8192 3.82 ms 3.82 ms 180
incrementA_Test/Obj/16384 7.77 ms 7.77 ms 90
incrementA_Test/Obj/32768 15.6 ms 15.6 ms 45
incrementA_Test/Obj/65536 30.5 ms 30.5 ms 23
incrementA_Test/Obj/131072 61.7 ms 61.7 ms 11
incrementA_Test/Obj/262144 122 ms 122 ms 6
incrementA_Test/Obj/524288 245 ms 245 ms 3
incrementB_Test/Obj/256 0.084 ms 0.084 ms 8246
incrementB_Test/Obj/512 0.166 ms 0.166 ms 4212
incrementB_Test/Obj/1024 0.321 ms 0.321 ms 2175
incrementB_Test/Obj/2048 0.629 ms 0.629 ms 1109
incrementB_Test/Obj/4096 1.23 ms 1.23 ms 564
incrementB_Test/Obj/8192 2.42 ms 2.42 ms 288
incrementB_Test/Obj/16384 4.84 ms 4.84 ms 142
incrementB_Test/Obj/32768 9.63 ms 9.63 ms 72
incrementB_Test/Obj/65536 20.3 ms 20.3 ms 34
incrementB_Test/Obj/131072 40.8 ms 40.8 ms 17
incrementB_Test/Obj/262144 81.7 ms 81.7 ms 8
incrementB_Test/Obj/524288 164 ms 164 ms 4
increment_Test/Obj/256 0.126 ms 0.126 ms 5551
increment_Test/Obj/512 0.244 ms 0.244 ms 2861
increment_Test/Obj/1024 0.482 ms 0.482 ms 1453
increment_Test/Obj/2048 0.958 ms 0.958 ms 721
increment_Test/Obj/4096 1.91 ms 1.91 ms 364
increment_Test/Obj/8192 3.82 ms 3.82 ms 183
increment_Test/Obj/16384 7.63 ms 7.63 ms 91
increment_Test/Obj/32768 15.2 ms 15.2 ms 46
increment_Test/Obj/65536 30.5 ms 30.5 ms 23
increment_Test/Obj/131072 61.0 ms 61.0 ms 11
increment_Test/Obj/262144 122 ms 122 ms 6
increment_Test/Obj/524288 244 ms 244 ms 3
Initially I thought that maybe the scaling strategy (powersave) was perhaps influencing the results, but after changing it to performance, the results were the same.
Just for reference, I compiled the google framework (bf585a2 [v1.5.2]) and my libs are:
$ ldd --version
ldd (Ubuntu GLIBC 2.27-3ubuntu1.2) 2.27
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Written by Roland McGrath and Ulrich Drepper.
$ g++ --version
g++ (Ubuntu 9.2.1-17ubuntu1~18.04.1) 9.2.1 20191102
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
I am pretty sure there are different ways of writing this same test, and I am welcome to read any suggestions, but my main interest is to know what is wrong with my code, and why I get different results.