2

I am trying to benchmark the function std::isdigit from the header cctype (the one inherited from C, just to be clear).

The code snippet is the following:

void BM_IsDigit_C(::benchmark::State& state) {
  const char c = GenerateRandomChar();
  for (auto _ : state) {
    ::benchmark::DoNotOptimize(std::isdigit(static_cast<unsigned char>(c)));
  }
}
BENCHMARK(BM_IsDigit_C);

It's quite easy to deduct that GenerateRandomChar is a simple function which generates a random char and should bring no overhead into the benchmark itself.


"""Unfortunately""", the compiler is able to completely optimize the code. It correctly generates the expected code for std::isdigit, but, in the assembly code of the benchmark the basic block is ignored.

The following is the profiled generated code (by gcc-10.1.0):

Perf Report

As you can see, the std::isdigit code is generated (compiler explorer example):

movzbl %r13b,%eax 
sub    $0x30,%eax
cmp    $0x9,%eax
setbe  %al     
movzbl %al,%eax

But it is completely ignored because of the empty loop:

68: sub    $0x1,%rbx   <---|
    jne    68        ------|

Little Note

The "C++ version" (with locale) generates the expected code: the loop testing the function code.

Perf Code C++


My questions are:

  • Why benchmark::DoNotOptimize does not work with this particular function?
  • How could I change the benchmark code in order to properly measure the time performance of that function?
Final Notes
  • I got the same "problem" with clang compiler.
  • I tried to "move" the tested function into another translation unit (forcing the not-inline attribute), but I got the same "problem".
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
BiagioF
  • 9,368
  • 2
  • 26
  • 50
  • 1
    You could [force the code to be executed](http://quick-bench.com/DMAf8ri9TLwJwRND97RvDwKXpAA), but now the measurement includes (unfairly, IMO) loads and stores that are not inherent to `isdigit` itself. Also shown: carefully decide whether you want to know latency or throughput (or both), for small pieces of code they tend to be *very* different. – harold May 30 '20 at 18:03
  • 1
    @JesperJuhl But obviously that *isn’t* the case, otherwise what would be the point of the `isdigit` call? – Konrad Rudolph May 30 '20 at 18:44
  • It's not *ignoring* `isdigit()`, it's just hoisting it out of the loop because of the way you used `DoNotOptimize`. You need to get the compiler to forget that `c` is a loop invariant, *and* to materialize the `isdigit(c)` result in a register every iteration. That might take a tmp vars and two DoNotOptimize, I'm not sure. As harold said, the timing is probably not as meaningful as you'd like because your real use-case will be sensitive to either front-end throughput or back-end latency, and it matters which. (Probably not back-end ALU port pressure.) – Peter Cordes May 31 '20 at 02:12
  • Also, if you branch on `isdigit`, it's just `sub` or `lea` / macro-fused `cmp + jcc` = 2 total uops. No need to materialize a boolean in a register if you're branching. (That difference matters more for the simpler ASCII version, not the locale-aware version that does a table lookup in case of higher codepoints also being digit characters.) – Peter Cordes May 31 '20 at 02:13

0 Answers0