Questions tagged [microbenchmark]

A microbenchmark attempts to measure the performance of a "small" bit of code. These tests are typically in the sub-millisecond range. The code being tested usually performs no I/O, or else is a test of some single, specific I/O task.

Microbenchmarking is very different from profiling! When profiling, you work with an entire application, either in production or in an environment very painstakingly contrived to resemble production. Because of this, you get performance data that is, for lack of a better term, real. When you microbenchmark, you get a result that is essentially fictional, and you must be very careful about what conclusions you draw from it.

Still, for either type always apply the old adage:
Premature optimization is the root of all evil.

485 questions
10
votes
2 answers

Why Document.querySelector is more efficient than Element.querySelector

I did a test with few iterations to test efficiency of Document.querySelector and Element.querySelector. Markup:
Script: Querying with Document.querySelector begin = performance.now(); var i = 0, …
Alexandre Thebaldi
  • 4,546
  • 6
  • 41
  • 55
10
votes
2 answers

unexpected results: microbenchmark

I've always been bugged a bit by the lack of accuracy I see bench marking with system.time and rbenchmark (in that the precision of the timing may be lacking) and saw Hadley reference the microbenchmark package recently. So I decided to give it a…
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
9
votes
1 answer

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

I was messing around with optimizing a function using Google Benchmark, and ran into a situation where my code was unexpectedly slowing down in certain situations. I started experimenting with it, looking at the compiled assembly, and eventually…
LRFLEW
  • 1,251
  • 3
  • 11
  • 19
9
votes
3 answers

what does STREAM memory bandwidth benchmark really measure?

I have a few questions on STREAM (http://www.cs.virginia.edu/stream/ref.html#runrules) benchmark. Below is the comment from stream.c. What is the rationale about the requirement that arrays should be 4 times the size of cache? * (a) Each…
9
votes
1 answer

Compiled R code is actually slower than pure R with JIT enabled

From Efficient R programming the byte compiler and R docment r byte compiler, I learnt that cmpfun can be used to compile a pure R function into byte code to speed and enableJIT can speed up by enabling just-in-time compilation. So, I decided to do…
JiaHao Xu
  • 2,452
  • 16
  • 31
9
votes
1 answer

Unexpected performance results when comparing dictionary lookup vs multiple is operators in .NET 4.7

I have the problem where I need to do dynamic dispatch based on an object type. The types based on which I need to dispatch are known at compile time - in my example they are 17. My initial guess was to use a Dictionary> for the…
Ivan Zlatanov
  • 5,146
  • 3
  • 29
  • 45
9
votes
1 answer

What can explain the huge performance penalty of writing a reference to a heap location?

While investigating the subtler consequences of generational garbage collectors on application performance, I have hit a quite staggering discrepancy in the performance of a very basic operation – a simple write to a heap location – with respect to…
Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
9
votes
2 answers

Benchmarking - CPU time bigger than wall time?

I measure cpu time and wall time of sorting algorithms on linux. Im using getrusage to measure a cpu time and clock_gettime CLOCK_MONOTONIC to get a wall time. Althought I noticed that a cpu time is bigger than wall time - is that correct? I always…
mazix
  • 2,540
  • 8
  • 39
  • 56
9
votes
1 answer

Estimating actual (not theoretic) runtime complexity of an implementation

Anyone in computer science will know that HeapSort is O(n log n) worst case in theory, while QuickSort is O(n^2) worst case. However, in practice, a well implemented QuickSort (with good heuristics) will outperform HeapSort on every single data set.…
Erich Schubert
  • 8,575
  • 2
  • 26
  • 42
8
votes
1 answer

Google Benchmark Frameworks DoNotOptimize

I am a bit confused about the implementation of the function void DoNotOptimize of the Google Benchmark Framework (definition from here): template inline BENCHMARK_ALWAYS_INLINE void DoNotOptimize(Tp const& value) { asm volatile("" : :…
Hymir
  • 811
  • 1
  • 10
  • 20
8
votes
1 answer

Drain the instruction pipeline of Intel Core 2 Duo?

I'm writing some micro-benchmarking code for some very short operations in C. For example, one thing I'm measuring is how many cycles are needed to call an empty function depending on the number of arguments passed. Currently, I'm timing using an…
Jay Conrod
  • 28,943
  • 19
  • 98
  • 110
8
votes
0 answers

What's up with the "half fence" behavior of rdtscp?

For many years x86 CPUs supported the rdtsc instruction, which reads the "time stamp counter" of the current CPU. The exact definition of this counter has changed over time, but on recent CPUs it is a counter that increments at a fixed frequency…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
8
votes
5 answers

Simple for() loop benchmark takes the same time with any loop bound

I'm willing to write a code that makes my CPU execute some operations and see how much time does he take to solve them. I wanted to make a loop going from i=0 to i<5000 and then multiplying i by a constant number and time that. I've ended up with…
NaW
  • 89
  • 1
  • 5
8
votes
4 answers

Fastest Linux system call

On an x86-64 Intel system that supports syscall and sysret what's the "fastest" system call from 64-bit user code on a vanilla kernel? In particular, it must be a system call that exercises the syscall/sysret user <-> kernel transition1, but does…
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
8
votes
1 answer

JMH - why do I need Blackhole.consumeCPU()

I'm trying to understand why it is wise to use Blackhole.consumeCPU()? Something I found about Blackhole.consumeCPU() on Google Sometimes when we run run a benchmark across multiple threads we also want to burn some cpu cycles to simulate CPU…
DRK
  • 127
  • 2
  • 6