1

Imagine that I would like to test serialization/deserializtion routines against three different datasets. This results in 2x3=6 benchmarks.

Ideally, I would like to achieve the following:

  • avoid code duplication
  • call dataset generator functions only once per executable invocation, and only when not exluded by --benchmark_filter=... (generator functions are expensive)
  • meaningful benchmark names (e.g. "Serialize/DatasetAlpha")

Neither of the features mentioned in the guide seem to exactly fit the purpose. The closest solution I found so far is to use vararg-parameterized Serialize()/Deserialize() functions along with generator functions which would return generated data as singletons.

Is there a better way?

This is what I would like to avoid:

#include <benchmark/benchmark.h>

/* library */
std::string serialize(const std::string& data) {
  return data;
}
std::string deserialize(const std::string& data) {
  return data;
}

/* helpers */
void SerializeHelper(benchmark::State& state, const std::string& data) {
  for (auto _ : state) {
    std::string bytes = serialize(data);
    benchmark::DoNotOptimize(bytes);
  }
}

void DeserializeHelper(benchmark::State& state, const std::string& data) {
  std::string bytes = serialize(data);
  for (auto _ : state) {
    std::string data_out = deserialize(data);
    benchmark::DoNotOptimize(data_out);
  }
}

std::string GenerateDatasetAlpha() {
  return "";
}
std::string GenerateDatasetBeta() {
  return "";
}
std::string GenerateDatasetGamma() {
  return "";
}


/* oh, my... */
void SerializeAlpha(benchmark::State& state) {
  SerializeHelper(state, GenerateDatasetAlpha());
}
void DeserializeAlpha(benchmark::State& state) {
  DeserializeHelper(state, GenerateDatasetAlpha());
}
void SerializeBeta(benchmark::State& state) {
  SerializeHelper(state, GenerateDatasetBeta());
}
void DeserializeBeta(benchmark::State& state) {
  DeserializeHelper(state, GenerateDatasetBeta());
}
void SerializeGamma(benchmark::State& state) {
  SerializeHelper(state, GenerateDatasetGamma());
}
void DeserializeGamma(benchmark::State& state) {
  DeserializeHelper(state, GenerateDatasetGamma());
}

BENCHMARK(SerializeAlpha);
BENCHMARK(DeserializeAlpha);
BENCHMARK(SerializeBeta);
BENCHMARK(DeserializeBeta);
BENCHMARK(SerializeGamma);
BENCHMARK(DeserializeGamma);

BENCHMARK_MAIN();

//g++ wtf.cc -o wtf -I benchmark/include/ -lbenchmark -L benchmark/build/src -lpthread -O3
gudok
  • 4,029
  • 2
  • 20
  • 30

1 Answers1

1

The closest solution I found so far is to use template benchmarks with per-dataset generator classes:

#include <benchmark/benchmark.h>

/* library */
std::string serialize(const std::string& data) {
  return data;
}
std::string deserialize(const std::string& data) {
  return data;
}

/* benchmarks routines */
template<typename Dataset>
void SerializeBenchmark(benchmark::State& state) {
  std::string data = Dataset()();
  for (auto _ : state) {
    std::string bytes = serialize(data);
    benchmark::DoNotOptimize(bytes);
  }
}

template<typename Dataset>
void DeserializeBenchmark(benchmark::State& state) {
  std::string data = Dataset()();
  std::string bytes = serialize(data);
  for (auto _ : state) {
    std::string data_out = deserialize(data);
    benchmark::DoNotOptimize(data_out);
  }
}

/* datasets generators and benchmark registration */

struct Dataset1 {
  std::string operator()() {
    return ""; // load from file, generate random data, etc
  }
};
BENCHMARK_TEMPLATE(SerializeBenchmark, Dataset1);
BENCHMARK_TEMPLATE(DeserializeBenchmark, Dataset1);

struct Dataset2 {
  std::string operator()() { return ""; }
};
BENCHMARK_TEMPLATE(SerializeBenchmark, Dataset2);
BENCHMARK_TEMPLATE(DeserializeBenchmark, Dataset2);

struct Dataset3 {
  std::string operator()() { return ""; }
};
BENCHMARK_TEMPLATE(SerializeBenchmark, Dataset3);
BENCHMARK_TEMPLATE(DeserializeBenchmark, Dataset3);

BENCHMARK_MAIN();

This keeps amount of code bloat at reasonably low level. Benchmark names are also good, e.g. SerializeBenchmark<Dataset2>. Dataset generation functions are still called multiple times, so if you want to avoid that, you will have to store them in singletons with lazy loading.

gudok
  • 4,029
  • 2
  • 20
  • 30