Questions tagged [microbenchmark]

A microbenchmark attempts to measure the performance of a "small" bit of code. These tests are typically in the sub-millisecond range. The code being tested usually performs no I/O, or else is a test of some single, specific I/O task.

Microbenchmarking is very different from profiling! When profiling, you work with an entire application, either in production or in an environment very painstakingly contrived to resemble production. Because of this, you get performance data that is, for lack of a better term, real. When you microbenchmark, you get a result that is essentially fictional, and you must be very careful about what conclusions you draw from it.

Still, for either type always apply the old adage:
Premature optimization is the root of all evil.

485 questions

votes

2 answers

Why Document.querySelector is more efficient than Element.querySelector

I did a test with few iterations to test efficiency of Document.querySelector and Element.querySelector. Markup: Script: Querying with Document.querySelector begin = performance.now(); var i = 0, …

asked Sep 07 '15 at 04:03

Alexandre Thebaldi

4,546
6
41
55

votes

2 answers

unexpected results: microbenchmark

I've always been bugged a bit by the lack of accuracy I see bench marking with system.time and rbenchmark (in that the precision of the timing may be lacking) and saw Hadley reference the microbenchmark package recently. So I decided to give it a…

r microbenchmark

asked Apr 28 '12 at 15:50

Tyler Rinker

108,132
65
322
519

votes

1 answer

Why does adding an xorps instruction make this function using cvtsi2ss and addss ~5x faster?

I was messing around with optimizing a function using Google Benchmark, and ran into a situation where my code was unexpectedly slowing down in certain situations. I started experimenting with it, looking at the compiled assembly, and eventually…

clang x86-64 cpu-architecture sse microbenchmark

asked Mar 14 '20 at 23:35

LRFLEW

1,251
3
11
19

votes

3 answers

what does STREAM memory bandwidth benchmark really measure?

I have a few questions on STREAM (http://www.cs.virginia.edu/stream/ref.html#runrules) benchmark. Below is the comment from stream.c. What is the rationale about the requirement that arrays should be 4 times the size of cache? * (a) Each…

benchmarking cpu-architecture microbenchmark memory-bandwidth

asked May 11 '19 at 03:44

yeeha

votes

1 answer

Compiled R code is actually slower than pure R with JIT enabled

From Efficient R programming the byte compiler and R docment r byte compiler, I learnt that cmpfun can be used to compile a pure R function into byte code to speed and enableJIT can speed up by enabling just-in-time compilation. So, I decided to do…

r bytecode jit microbenchmark

asked Mar 01 '19 at 11:20

JiaHao Xu

2,452
16
31

votes

1 answer

Unexpected performance results when comparing dictionary lookup vs multiple is operators in .NET 4.7

I have the problem where I need to do dynamic dispatch based on an object type. The types based on which I need to dispatch are known at compile time - in my example they are 17. My initial guess was to use a Dictionary> for the…

c# microbenchmark .net-4.7

asked Dec 28 '17 at 12:39

Ivan Zlatanov

5,146
3
29
45

votes

1 answer

What can explain the huge performance penalty of writing a reference to a heap location?

While investigating the subtler consequences of generational garbage collectors on application performance, I have hit a quite staggering discrepancy in the performance of a very basic operation – a simple write to a heap location – with respect to…

java garbage-collection microbenchmark jmh

asked Feb 03 '14 at 09:44

Marko Topolnik

195,646
29
319
436

votes

2 answers

Benchmarking - CPU time bigger than wall time?

I measure cpu time and wall time of sorting algorithms on linux. Im using getrusage to measure a cpu time and clock_gettime CLOCK_MONOTONIC to get a wall time. Althought I noticed that a cpu time is bigger than wall time - is that correct? I always…

c benchmarking microbenchmark

asked Jul 24 '13 at 19:53

mazix

2,540
8
39
56

votes

1 answer

Estimating actual (not theoretic) runtime complexity of an implementation

Anyone in computer science will know that HeapSort is O(n log n) worst case in theory, while QuickSort is O(n^2) worst case. However, in practice, a well implemented QuickSort (with good heuristics) will outperform HeapSort on every single data set.…

java complexity-theory benchmarking caliper microbenchmark

asked Jul 05 '13 at 16:52

Erich Schubert

8,575
2
26
42

votes

1 answer

Google Benchmark Frameworks DoNotOptimize

I am a bit confused about the implementation of the function void DoNotOptimize of the Google Benchmark Framework (definition from here): template inline BENCHMARK_ALWAYS_INLINE void DoNotOptimize(Tp const& value) { asm volatile("" : :…

c++ assembly inline-assembly microbenchmark google-benchmark

asked Mar 25 '21 at 08:04

Hymir

votes

1 answer

Drain the instruction pipeline of Intel Core 2 Duo?

I'm writing some micro-benchmarking code for some very short operations in C. For example, one thing I'm measuring is how many cycles are needed to call an empty function depending on the number of arguments passed. Currently, I'm timing using an…

c assembly x86 benchmarking microbenchmark

asked Feb 22 '09 at 17:52

Jay Conrod

28,943
19
98
110

votes

0 answers

What's up with the "half fence" behavior of rdtscp?

For many years x86 CPUs supported the rdtsc instruction, which reads the "time stamp counter" of the current CPU. The exact definition of this counter has changed over time, but on recent CPUs it is a counter that increments at a fixed frequency…

performance assembly x86 microbenchmark rdtsc

asked Sep 04 '18 at 03:53

BeeOnRope

60,350
16
207
386

votes

5 answers

Simple for() loop benchmark takes the same time with any loop bound

I'm willing to write a code that makes my CPU execute some operations and see how much time does he take to solve them. I wanted to make a loop going from i=0 to i<5000 and then multiplying i by a constant number and time that. I've ended up with…

c++ performance benchmarking microbenchmark

asked Jun 19 '18 at 09:27

NaW

votes

4 answers

Fastest Linux system call

On an x86-64 Intel system that supports syscall and sysret what's the "fastest" system call from 64-bit user code on a vanilla kernel? In particular, it must be a system call that exercises the syscall/sysret user <-> kernel transition1, but does…

linux performance x86-64 microbenchmark

asked Feb 21 '18 at 18:34

BeeOnRope

60,350
16
207
386

votes

1 answer

JMH - why do I need Blackhole.consumeCPU()

I'm trying to understand why it is wise to use Blackhole.consumeCPU()? Something I found about Blackhole.consumeCPU() on Google Sometimes when we run run a benchmark across multiple threads we also want to burn some cpu cycles to simulate CPU…

java benchmarking microbenchmark jmh blackhole

asked Mar 29 '16 at 15:23

DRK

Prev 1 2 3

…

32 33 Next