Questions tagged [microbenchmark]

A microbenchmark attempts to measure the performance of a "small" bit of code. These tests are typically in the sub-millisecond range. The code being tested usually performs no I/O, or else is a test of some single, specific I/O task.

Microbenchmarking is very different from profiling! When profiling, you work with an entire application, either in production or in an environment very painstakingly contrived to resemble production. Because of this, you get performance data that is, for lack of a better term, real. When you microbenchmark, you get a result that is essentially fictional, and you must be very careful about what conclusions you draw from it.

Still, for either type always apply the old adage:
Premature optimization is the root of all evil.

485 questions
5
votes
1 answer

Is there any difference in between (rdtsc + lfence + rdtsc) and (rdtsc + rdtscp) in measuring execution time?

As far as I know, the main difference in runtime ordering in a processor with respect to rdtsc and rdtscp instruction is that whether the execution waits until all previous instructions are executed locally. In other words, it means lfence + rdtsc…
ruach
  • 1,369
  • 11
  • 21
5
votes
2 answers

R microbenchmark: How to pass same argument to evaluated functions?

I'd like to evaluate the time to extract data from a raster time series using different file types (geotiff, binary) or objects (RasterBrick, RasterStack). I created a function that will extract the time series from a random point of the raster…
Daniel
  • 462
  • 3
  • 13
5
votes
2 answers

gcc flags to disable arithmetic optimisations

Does gcc/g++ have flags to enable or disable arithmetic optimisations, e.g. where a+a+...+a is replaced by n*a when a is an integer? In particular, can this be disabled when using -O2 or -O3? In the example below even with -O0 the add operations are…
user1059432
  • 321
  • 2
  • 5
5
votes
1 answer

Google Benchmark Custom Setup And Teardown Method

I am using benchmark library to benchmark some codes. I want to call a setup method before calling the actual benchmark code one time and not to be repeated everytime, for multiple benchmark method calls.. For e.g: static void…
Kishan Kumar
  • 302
  • 8
  • 19
5
votes
1 answer

How Stable Should CPU Frequency Be for Benchmarking?

Note: exhaustive system details are given at the end of the question. I am trying to get my development machine to have a very stable CPU frequency so that I can get precise benchmarks of some linear algebra codes - however, it still displays…
Sam Manzer
  • 1,220
  • 10
  • 23
5
votes
1 answer

Previous code seems to affect time for later function call

I'm trying to benchmark relatively small portions of a set of larger algorithms that are implemented in C++. In simplification, one could say that each algorithm is implemented via two functions (let's call them foo() and bar()) that can be called…
Qundercut
  • 51
  • 4
5
votes
2 answers

Why jnz requires 2 cycles to complete in an inner loop

I'm on an IvyBridge. I found the performance behavior of jnz inconsistent in inner loop and outer loop. The following simple program has an inner loop with fixed size 16: global _start _start: mov rcx, 100000000 .loop_outer: mov rax, …
5
votes
2 answers

Why is CPUID + RDTSC unreliable?

I am trying to profile a code for execution time on an x86-64 processor. I am referring to this Intel white paper and also gone through other SO threads discussing the topic of using RDTSCP vs CPUID+RDTSC here and here. In the above mentioned…
talekeDskobeDa
  • 372
  • 2
  • 13
5
votes
0 answers

C# micro-benchmark: why reseting aggregation value make for-loops faster?

Consider the following two different functions ComputeA and ComnputeB: using System; using System.Diagnostics; namespace BenchmarkLoop { class Program { private static double[] _dataRow; private static double[] _dataCol; …
Thomas W.
  • 2,134
  • 2
  • 24
  • 46
5
votes
1 answer

Explanation for why allocating a second time changes performance

I was testing some micro benchmarks on dense matrix multiplication (as a curiosity), and I noticed some very strange performance results. Here is a minimal working example: #include #include constexpr long long n =…
helloworld922
  • 10,801
  • 5
  • 48
  • 85
5
votes
2 answers

Why sort is slower than order function in R?

All is in the title. I would expect that order uses sort to find the order of the values in a vector. Thus sort should be quicker than order to sort a vector, but this is not the…
Simon C.
  • 1,058
  • 15
  • 33
5
votes
1 answer

Evaluate multiline codeblock with microbenchmark

Is it possible to evaluate a codeblock consisting of multiple lines of code with microbenchmark? If so, how? Example: We have some numeric data in character columns: testdata <- tibble::tibble(col1 = runif(1000), col2 = as.character(runif(1000)),…
Marijn Stevering
  • 1,204
  • 10
  • 24
5
votes
1 answer

What does `static_cast` mean for the optimizer?

When people are trying to perform rigorous benchmarks in various libraries, I sometimes see code like this: auto std_start = std::chrono::steady_clock::now(); for (int i = 0; i < 10000; ++i) for (int j = 0; j < 10000; ++j) volatile const auto…
Chris Beck
  • 15,614
  • 4
  • 51
  • 87
5
votes
4 answers

Why is standard R median function so much slower than a simple C++ alternative?

I made the following implementation of the median in C++ and and used it in R via Rcpp: // [[Rcpp::export]] double median2(std::vector x){ double median; size_t size = x.size(); sort(x.begin(), x.end()); if (size % 2 == 0){ …
Ruben
  • 304
  • 3
  • 15
5
votes
2 answers

How to use list-argument in microbenchmark

How do one use the list-argument in the microbenchmark function. I want to microbenchmark the same function with different inputs as in microbenchmark(j1 = {sample(1e5)}, j2 = {sample(2e5)}, j3 = {sample(3e5)}) The…
Tobias Madsen
  • 857
  • 8
  • 10