BenchmarkDotNet gives unexpected results

Question

I was doing an investigation in calculation performance in int, float, double, decimal. And I am wondering in the results. First of all I was expecting that when we doing plus operations the winner will be int but the true is on the screenshot.

Benchmark results

below the code I am inspecting.


public class PerformanceTest
{
    [Benchmark]
    public void CalcDouble()
    {
        double firstDigit = 135.543d;
        double secondDigit = 145.1234;

        double result = firstDigit + secondDigit;

    }

    [Benchmark]
    public void CalcDecimal()
    {
        decimal firstDigit = 135.543m;
        decimal secondDigit = 145.1234m;

        decimal result = firstDigit + secondDigit;
    }

    [Benchmark]
    public void Calcfloat()
    {
        float firstDigit = 135.543f;
        float secondDigit = 145.1234f;

        float result = firstDigit + secondDigit;
    }

    [Benchmark]
    public void Calcint()
    {
        int firstDigit = 135;
        int secondDigit = 145;

        int result = firstDigit + secondDigit;
    }
}

Can anyone explain me what is going on? Tank you.

I am expecting to have Int as the winner but the winner is float.

Look at the error -- it's larger than the mean! The median is also 0 (over half of all measured values were 0), which means that the mean is skewed by a few large values (which fits with the large error) You're benchmarking things which are very very cheap, so there's going to be a lot of noise: the throughput is going to depend much more on other factors here. I think the only things you can take away from this benchmark are 1) addition is really **really** cheap, and 2) decimal addition is slightly slower — canton7, Apr 25 '23 at 09:50
Additionally, the benchmark results are all text - so please include them *as text* rather than as a screenshot. — Jon Skeet, Apr 25 '23 at 09:55
You've also got the problem that the JIT is smarter than you. You don't return anything, so the JIT [removes the method body](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAZgAJiAmcgBRigDNp9sA7MGAFRlwwFgAUAG8h5cZQrEU5AMLYANmAAiEAK7AFMABQBKMRNGCJJ8gBN1mmOUYBLKH2W2A5rYzkAvOQCMpAKwAdH4opGYA3Aam5pZa5LgwkGxmTq7uXt4ogd5UpCgRgpGmFhqxsLhqCmk29o4ubuQA1HEJEEkpbvmFAL5CXUA=). — canton7, Apr 25 '23 at 11:01
Even if you fix this, the JIT does constant folding and [removes the addition](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAZgAJiAmcgBRigDNp9sA7MGAFRlwwFgAUAG8h5cZQoATCAFdgAGxjkAwtgVgAInMUwAFAEoxE0YInnyM+UvKMAllD6a7AczsZyAXnIBGUgFYAOn8UUikAbmMLSx0bXBhINilnNw9vHxQgnypSFEjBKIsrXXJYXFkFNNsHJ1d3cgBqcnjE5LqMfOjxYgB2Ut4KjsKAXyFhoA=), just returning a constant 280.6664 — canton7, Apr 25 '23 at 11:02

score 9 · Answer 1 · answered Apr 25 '23 at 10:05

This shows the problem of benchmarking things that are really really fast; you can slow it down by doing more - optionally using OperationsPerInvoke to scale the results accordingly:

Better results:

|      Method |      Mean |     Error |    StdDev |
|------------ |----------:|----------:|----------:|
|  CalcDouble | 0.8501 ns | 0.0007 ns | 0.0006 ns |
| CalcDecimal | 3.6070 ns | 0.0371 ns | 0.0329 ns |
|  CalcSingle | 0.8512 ns | 0.0027 ns | 0.0024 ns |
|   CalcInt32 | 0.2301 ns | 0.0019 ns | 0.0017 ns |

In particular, the error is now a rounding-error of the mean.

Code:

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<PerformanceTest>();
public class PerformanceTest
{
    const int OperationsPerInvoke = 4096;

    [Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
    public double CalcDouble()
    {
        double firstDigit = 135.543d;
        double secondDigit = 145.1234;

        for (int i = 0; i < OperationsPerInvoke; i++)
        {
            firstDigit += secondDigit;
        }
        return firstDigit;
    }

    [Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
    public decimal CalcDecimal()
    {
        decimal firstDigit = 135.543m;
        decimal secondDigit = 145.1234m;

        for (int i = 0; i < OperationsPerInvoke; i++)
        {
            firstDigit += secondDigit;
        }
        return firstDigit;

    }

    [Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
    public float CalcSingle()
    {
        float firstDigit = 135.543f;
        float secondDigit = 145.1234f;

        for (int i = 0; i < OperationsPerInvoke; i++)
        {
            firstDigit += secondDigit;
        }
        return firstDigit;
    }

    [Benchmark(OperationsPerInvoke = OperationsPerInvoke)]
    public int CalcInt32()
    {
        int firstDigit = 135;
        int secondDigit = 145;

        for (int i = 0; i < OperationsPerInvoke; i++)
        {
            firstDigit += secondDigit;
        }
        return firstDigit;
    }
}

Each of these functions is also performing OperationsPerInvoke integer additions via `i++`, in addition to whatever comparisons and branching survive loop unrolling. Would it also be helpful to include a baseline benchmark, where you just have the `for` loop with no body, and return `i`? — StriplingWarrior, Apr 26 '23 at 04:55

score 5 · Accepted Answer · answered Apr 26 '23 at 04:38

Part 1. The problem with the benchmarks

Both C# compiler and the Just-In-Time (JIT) compiler are allowed to perform various optimizations on your code. The exact set of optimization depends on the specific versions of these compilers, but there are some basic code transformations that you should expect by default.

One of the optimizations in your example is known as constant folding; it is capable of condensing

double firstDigit = 135.543d;
double secondDigit = 145.1234;
double result = firstDigit + secondDigit;

to

double result = 280.6664d;

Another optimization is known as the dead code elimination. Since you do not use the results of your calculations in the benchmarks, C#/JIT compilers are able to eliminate this code completely. Therefore, effectively, you benchmark an empty method like this:

[Benchmark]
public void CalcDouble()
{
}

The only exception is CalcDecimal: since Decimal is a struct in C# (not a primitive type), the C#/Roslyn compilers are not smart enough to completely eliminate the calculations (for now; this can be improved in the future).

Both of these optimizations are discussed in detail in the context of .NET benchmarking in the book "Pro .NET Benchmarking" (page 65: Dead Code Elimination, page 69: Constant Folding). Both topics belong to the Chapter "Common Benchmarking Pitfalls" which contains more pitfalls that can distort the results of your benchmarks.

Part 2. BenchmarkDotNet Results

When you pasted your summary table, you cut the warning section below the table. I rerun your benchmarks, and here are the extended version of the results (that is presented in BenchmarkDotNet results by default):

|      Method |      Mean |     Error |    StdDev |    Median |
|------------ |----------:|----------:|----------:|----------:|
|  CalcDouble | 0.0006 ns | 0.0023 ns | 0.0022 ns | 0.0000 ns |
| CalcDecimal | 3.0367 ns | 0.0527 ns | 0.0493 ns | 3.0135 ns |
|   Calcfloat | 0.0026 ns | 0.0023 ns | 0.0021 ns | 0.0000 ns |
|     Calcint | 0.0004 ns | 0.0010 ns | 0.0009 ns | 0.0000 ns |

// * Warnings *
ZeroMeasurement
  PerformanceTest.CalcDouble: Default -> The method duration is indistinguishable from the empty method duration
  PerformanceTest.Calcfloat: Default  -> The method duration is indistinguishable from the empty method duration
  PerformanceTest.Calcint: Default    -> The method duration is indistinguishable from the empty method duration

These warnings provide valuable insight into the results. Effectively, CalcDouble, Calcfloat, and Calcint take the same amount of time as an empty method like

public void Empty() { }

The numbers you see in the Mean column are just random CPU noise that is below the duration of one CPU cycle. Let's say the frequency of your CPU is 5GHz. It implies that the duration of a single CPU cycle is about 0.2ns. Nothing can be performed faster than one CPU cycle (if we talk about the latency of an operation that is measured by default in BenchmarkDotNet; if we switch to throughput measurements, we can get "faster" calculations that to various effects like instruction level parallelism, see "Pro .NET Benchmarking", page 440). The "Mean" value for CalcDouble, Calcfloat, and Calcint is significantly less than the duration of a single CPU cycle, so it doesn't make sense to actually compare them.

BenchmarkDotNet understands that something is wrong with the Mean column. So, in addition to warnings below the summary table, it adds a bonus column Median (which is hidden by default) to highlight the zero duration or emptiness of the discussed benchmarks.

Part 3. Possible benchmark design improvements

The best way to design such a benchmark is to make it similar to an actual real-life workload that is considered. The actual performance of arithmetic operations is an extremely tricky thing to measure; it depends on dozens of external factors (like the instruction level parallelism that I mentioned earlier). For details, see Chapter 7 "CPU-bound benchmarks" of "Pro .NET Benchmarking"; it has 24 case studies that provide various examples. Evaluating the "pure" duration of an arithmetic operation is an interesting technical challenge, but it can not be applicable to real-life code.

Here is also a few recommended BenchmarkDotNet tricks to design better benchmarks:

Move all the "constant" variables to public fields/properties. In this case, C#/JIT compiler will not be able to apply constant folding (because it doesn't know in advance that nobody is going to actually change the values of these public fields/properties).
Return the result of your calculations from the [Benchmark] method. This is the way to ask BenchmarkDotNet to prevent dead code elimination.

The suggested approach by Marc Gravell will work to some extent. However, it may also have some other issues:

Using the constant number of iterations in the for loop is not recommended since different JIT compilers may apply loop unrolling differently (another benchmarking pitfall, see "Pro .NET Benchmarking", page 61), which may distort the results in some environments.
Be aware of the fact that adding an artificial loop adds some performance costs to your benchmarks. So, it's OK to use such a set of benchmarks for getting relative results, but the absolute numbers will also include the loop overhead ("Pro .NET Benchmarking", page 54).
To the best of my knowledge, the modern C#/JIT compilers are not smart enough to completely eliminate such code. However, we have no guarantees that it will not be eliminated since it effectively returns the same constant all the time. Future versions of compilers can be smart enough to perform such optimizations (I believe that some Java runtimes are capable of eliminating similar benchmarks). So, it's better to move all the constants to public non-constant fields/properties in order to prevent such situations.

Well if you're going to keep coming out with salient, well sourced points backed up by extensive experience, with reference to a quality book, I'll... I'll... (gets out credit card) — Marc Gravell, Apr 26 '23 at 08:32

BenchmarkDotNet gives unexpected results

2 Answers2

Part 1. The problem with the benchmarks

Part 2. BenchmarkDotNet Results

Part 3. Possible benchmark design improvements