1

I'm trying to exclude the conversion of string into an object in a function. This is the involved function:

std::vector<std::pair<value_type, size_t>> read_file(const std::string path, benchmark::State& state) {
  
  std::string kmer;
  std::vector<std::pair<value_type, size_t>> data;
  
  std::ifstream file(path);
  while (std::getline(file, kmer)) {
    state.PauseTiming();
    kmer_t tmp(kmer);
    state.ResumeTiming();
    data.push_back(std::make_pair(tmp.value, tmp.index));
  }

  return data;
}

The function scope is to read a file and convert line by line into an object. The resulted object is insered into a vector of pair. I include in my project the google benchmark library to compute how much time and memory is used. I would like to exclude the conversion from the total count. I implemented the function just like the documentation said but the resulting time is much higher then a normal computation without the timer management.

I also found this old but related opened issue but I can't resolve my problem. How can I fix this problem or there are any work around for the issue?

th3g3ntl3man
  • 1,926
  • 5
  • 29
  • 50

1 Answers1

1

There is nothing you can do here, this is expected. Starting and stopping a timer requires some form of synchronization with the OS, and the overhead of that in your case seems to be much higher than creating the temporary object.

However, this shouldn't be an issue. If you're trying to compare several methods of filling the vector and exclude the creation of the objects in all of them then the overhead of stopping and restarting the timer will be the same for all of these different methods, so if one of the methods is faster than another it will also be faster with the added timer management overhead, only the relative difference will be smaller.

I'd even argue that in your case where you're measuring the entire parsing of the file and aren't microbenchmarking e.g. push_back vs emplace_back it is better to include the object creation in the measurements to get a more accurate sense of how significant performance differences are, e.g. when comparing this version with one reusing the kmer_t object between iterations (and thus possibly reusing the already allocated memory for data members).

Corristo
  • 4,911
  • 1
  • 20
  • 36
  • *the overhead of stopping and restarting the timer will be the same for all of these different methods* - Not really. If it actually took a system call to stop and start, it would serialize out-of-order exec, and thus reduce the relative cost of branch misprediction among other things, and also ability to exploit ILP across iterations. Also, any system call tends to disturb some L1d cache lines, making code run slower for a while upon return to user-space before things settle down again. It's certainly possible that could flatten differences between two different ways of doing things. – Peter Cordes Mar 03 '21 at 03:30
  • Agreed with your last sentence, though: include construction in the benchmark, and simply try to create a whole benchmark loop that does what you want to measure. Breaking code up on smaller boundaries than the out-of-order execution window (224 uops (about that many instructions) in Skylake for example) is highly problematic. – Peter Cordes Mar 03 '21 at 03:32