1

I'm new to using Google Benchmark and receive different results running the same benchmark (below), which retrieves the local time using C++, when running the code locally vs on Quick-Bench.com. Both times I used GCC 8.2 and -O3.

Why do the results vary dramatically between running locally vs on quick-bench.com? Which is correct?

#include <benchmark/benchmark.h>
#include <ctime>      
#include <sys/time.h> 
#include <chrono>     


static void BM_ctime(benchmark::State& state) {
  unsigned long long count = 0;

  for (auto _ : state) {
    std::time_t sec = std::time(0);  

    benchmark::DoNotOptimize(count += sec);
  }
}

BENCHMARK(BM_ctime);


static void BM_sysTime(benchmark::State& state) {
  unsigned long long count = 0;

  for (auto _ : state) {
    unsigned long sec = time(NULL);

    benchmark::DoNotOptimize(count += sec);
  }
}

BENCHMARK(BM_sysTime);


static void BM_chronoMilliseconds(benchmark::State& state) {
  unsigned long long count = 0;

  for (auto _ : state) {
    unsigned long long ms = std::chrono::duration_cast<std::chrono::milliseconds>(
      std::chrono::system_clock::now().time_since_epoch()
    ).count();

    benchmark::DoNotOptimize(count += ms);
  }
}

BENCHMARK(BM_chronoMilliseconds);

static void BM_chronoSececonds(benchmark::State& state) {
  unsigned long long count = 0;

  for (auto _ : state) {
    unsigned long long sec = std::chrono::duration_cast<std::chrono::seconds>(
      std::chrono::system_clock::now().time_since_epoch()
    ).count();

    benchmark::DoNotOptimize(count += sec);
  }
}

BENCHMARK(BM_chronoSececonds);

Locally the following results are produced:

-------------------------------------------------------------
Benchmark                      Time           CPU Iterations
-------------------------------------------------------------
BM_ctime                     183 ns        175 ns    4082013
BM_sysTime                   197 ns        179 ns    4004829
BM_chronoMilliseconds         37 ns         36 ns   19092506
BM_chronoSececonds            37 ns         36 ns   19057991

QuickBench results:

Biffen
  • 6,249
  • 6
  • 28
  • 36
Orangeberry
  • 41
  • 1
  • 7

2 Answers2

1

Benchmark results are platform/architecture/machine dependent. It isn't even practical to assume your benchmarks will always be the same when you are running them on the same machine, things like temperature, performance scaling options, wear and tear, etc., can affect performance.

Karl
  • 161
  • 4
  • True, but it's not a just a little variance. Locally the chrono versions are 5x faster, but on QuickBench they are 1,000x *slower*, indicating there may be something wrong in the test. But I looked at the assembly on QuickBench and it seems to be calling the functions so I don't think it's optimizing them out. – Orangeberry Aug 29 '18 at 14:08
  • You're basically timing system calls on an AWS virtual machine. Slightly related: https://blog.packagecloud.io/eng/2017/03/08/system-calls-are-much-slower-on-ec2/ – Lack Oct 28 '18 at 17:45
0

I just run your example on my machine and I see the below result:

----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
BM_ctime                    3.26 ns         3.25 ns    215110555
BM_sysTime                  3.26 ns         3.25 ns    215154791
BM_chronoMilliseconds       2502 ns         2502 ns       279856
BM_chronoSececonds          2502 ns         2501 ns       279854

Assuming that a NOP instruction takes 1 clock cycle, which is 0.5 ns on my system, the ratio CPU time / NoOp time is around 5000.

However, I should not be really concerned because that is not what bench-marking is meant for at least for me. It doesn't make sense to compare values on my system with the values from Quick bench. Rather, I use benchmark values to compare different implementations or algorithms on the same machine, eliminating such doubts.

talekeDskobeDa
  • 372
  • 2
  • 13