-1

I try to improve the perfomance of my program. I used JMH to comparate my two versions but I don't know if there are a real difference.

Example of my results:

                Version1(op/s)   score error(op/s)         Version2      score error

Benchmark 1   12382150,338           1277638,481        18855038,903    50835,395

Benchmark 2     11708047,2           4061755,193        18843828,659    41966,689

Benchmark 3      7814465,4           9483927,071        18821356,961    72364,651

Benchmark 4   10481146,451             464691,58        13936537,089    40726,506

Benchmark 5    6863734,072            175974,219         9709381,687    21774,816

Can those results show a real difference between the version 1 and the version 2?

lguerin
  • 1
  • 1
  • I don't understand your question. There are clear differences in the benchmark numbers, version 2 is clearly faster in all the benchmarks. What are you asking about, what kind of answer do you expect? – Petr Janeček Jul 02 '15 at 13:36
  • yes but the score give a confidence interval very large. For the Benchmark 3, the first interval (for the version 1) is [-1669461,671, 17298392,471] and the second (for the version 2) is [18748992,310, 18893721,612]. It's close values, no? – lguerin Jul 02 '15 at 14:53

2 Answers2

0

IIRC, the benchmark score (ops/s) is an arithmetic mean of a 90% distribution (that is, extreme outliers are filtered out). Thus, no matter how you slice it, version 2 scores higher on all benchmarks.

llogiq
  • 13,815
  • 8
  • 40
  • 72
0

In my experience you need to consider the scale of the operation to reason about it. You are benchmarking methods that are fairly trivial in length and that can make results hard to read. For example, deriving from your results for benchmark 1:

              Version1                  Version2
Benchmark 1   12382150 +-1277638 ops/s  18855038+-50835 ops/sec
same as       80+-7 ns/op                53+-1 ns/op

The reporting is easy to fix using "-bm avgt -tu ns" which will set the benchmark mode to average time and time unit to nanoseconds. The scale of the benchmarks is also useful in deciding how important improvements are and how sceptical one should be about the benchmarks in question. If you are concerned about the variance in your benchmarks you should also make sure you run the benchmarks with sufficient iterations(-i) and forks(-f). You should also make sure you run the benchmarks on a quiet machine. Also make sure you can set the CPU frequency for the duration of the benchmark to avoid variance caused by turbo-boost, overheating and power management governors.

Nitsan Wakart
  • 2,841
  • 22
  • 27