JMH - weird benchmarking results

Question

I'm using JMH to benchmark DOM parser. I got really weird results as the first iteration actually run faster than later iterations

Can anyone explain why this happens? Also, what do percentiles and all the figures mean and why it starts getting stable after the third iteration? Does one iteration mean one iteration of the entire benchmarking method? Below is the method I'm running

@Benchmark 
@BenchmarkMode(Mode.SingleShotTime) 
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 13, time = 1, timeUnit = TimeUnit.MILLISECONDS)
public void testMethod_no_attr() {
    try {
        File fXmlFile = new File("500000-6.xml");
        DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();
        DocumentBuilder dBuilder = dbFactory.newDocumentBuilder();
        Document doc = dBuilder.parse(fXmlFile);
    } catch (Exception  e) {
        e.printStackTrace();  
    }
}

these are not iterations but [percentiles](https://en.wikipedia.org/wiki/Percentile) - p(50.0000) - 2608.183 ms/op means that 50 percent of the traced operations were done in 2608.183 ms or earlier, p(99.9999) - 2978.871 ms/op means that 99.9999 percent of operations finished within 2978.871 ms (this includes the ones of all smaller percentiles, off course) — Ole, Feb 22 '18 at 21:29

score 2 · Answer 1 · answered Feb 22 '18 at 22:50

These percentiles are horribly misleading. Don't use them.

If you have N = 10, it is no wonder all the bottom values are the same. Because you have way too few samples to talk about anything like a 99.999% quantile. The only value I would actually consider here is the median (50% quantile), and eventually the IQR (inter-quartile-ratio).

If you are optimistic, you'd assume your slowest sample is at 95%. If you are less optimistic, treat the slowest sample of N=10 as 90% quantile. And if you are more serious about these estimates, you may treat the spacing as being indicative of ca. 1/sqrt(N-1), i.e. assume that up to 33% of runs will be even slower than the slowest sample of N=10. You need many more samples to narrow down these estimates! Anything beyond 95% is just speculation. 99% - you cannot answer what is happening there. You just have not enough data.

As presented these values are just outright nonsense. Based on the data, you cannot estimate that P(99,9999)=something. This number corresponds to one out of a million runs being worse than this. But you only did 10, don't use this little N to predict what happens if you'd have a million. JMH should not print these extreme quantiles for small N.

score 1 · Accepted Answer · edited Jun 20 '20 at 09:12

Your screenshot is missing histogram output. What you see is just a percentile of the histogram.

Check this example. We have 100 iterations and the result for method BoolArrayVsBitSetBenchmark#primitive is:

Result "de.jawb.jmh.benchmark.example.bool.BoolArrayVsBitSetBenchmark.primitive":
  N = 100
  mean =      0,493 ±(99.9%) 0,003 s/op

  Histogram, s/op:
    [0,490, 0,495) = 93
    [0,495, 0,500) = 3
    [0,500, 0,505) = 1
    [0,505, 0,510) = 1
    [0,510, 0,515) = 0
    [0,515, 0,520) = 0
    [0,520, 0,525) = 1
    [0,525, 0,530) = 0
    [0,530, 0,535) = 0
    [0,535, 0,540) = 0
    [0,540, 0,545) = 0
    [0,545, 0,550) = 0
    [0,550, 0,555) = 0
    [0,555, 0,560) = 1

  Percentiles, s/op:
      p(0,0000) =      0,490 s/op 
     p(50,0000) =      0,491 s/op
     p(90,0000) =      0,494 s/op
     p(95,0000) =      0,496 s/op
     p(99,0000) =      0,558 s/op
     p(99,9000) =      0,559 s/op
     p(99,9900) =      0,559 s/op
     p(99,9990) =      0,559 s/op
     p(99,9999) =      0,559 s/op
    p(100,0000) =      0,559 s/op

The first histogram line [0,490, 0,495) = 93 says that 93 calls belong to the group max 0,490s and min 0,494s. Note the bracket ')'

In summary, this means that no call was faster as 0,491 s/op, because of

p(0,0000) = 0,490 s/op

And max execution time is 0,559 s/op, because of

p(100,0000) = 0,559 s/op

Thanks. Updated the histogram. Anyway, what does those values mean `[0,005, 0,005) = 14`? — Tri Nguyen, Feb 22 '18 at 21:47
I have updated my example. The execution takes now a bit slower. As you can see [0.005, 0,005) = 14 means. 14 calls took max 0,005 s. — alex, Feb 22 '18 at 22:12
Thanks. What does the warm-up iteration mean? Why are they different? — Tri Nguyen, Feb 22 '18 at 23:50
@TriNguyen Warmup is required to trigger the JIT Optimization, loading / initializing cache, creating threads etc.. — alex, Feb 23 '18 at 08:50

JMH - weird benchmarking results

2 Answers2