Caliper: micro- and macro benchmarks

Question

For ELKI I need (and have) more flexible sorting implementations than what is provided with the standard Java JDK and the Collections API. (Sorting is not my ultimate goal. I use partial sorting for bulk loading index structures such as the k-d-tree and R*-tree, and I want to make a rather generic implementation of these available, more generic than what is currently in ELKI - but either way, optimizing the sort means optimizing the index construction time).

However, sorting algorithms scale very differently depending on your data size. For tiny arrays, it is a known fact that insertion sort can perform well (and in fact, most quicksort implementations will fall back to insertion sort below a certain threshold); not by theory but by CPU pipelining and code size effects not considered by sorting theory.

So I'm currently benchmarking a number of sorting implementations to find the best combination for my particular needs; I want my more flexible implementations to be somewhat on par with the JDK default implementations (which are already fine tuned, but maybe for a different JDK version).

In the long run, I need these things to be easy to reproduce and re-run. At some point, we'll see JDK8. And on Dalvik VM, the results may also be different than on Java 7. Heck, they might even be different on AMD, Core i7 and Atom CPUs, too. So maybe Cervidae will include different sorting strategies, and choose the most appropriate one on class loading time.

My current efforts are on GitHub: https://github.com/kno10/cervidae

So now to the actual question. The latest caliper commit added some experimental code for macrobenchmarks. However, I'm facing the problem that I need both. Caliper macrobenchmarks fail when the runtime is less than 0.1% of the timer resolution; with 10000 objects some algorithms hit this threshold. At the same time, microbenchmarks complain that you should be doing a macrobenchmark when your runs take too long...

So for benchmarking different sort sizes, I'd actually need an approach that dynamically switches from microbenchmarking to macrobenchmarking depending on the runtime. In fact, I'd even prefer if caliper would automagically realize that the runtime is large enough for a macro benchmark, and then just do a single iteration.

Right now, I'm trying to emulate this by using:

@Macrobenchmark
public int macroBenchmark() { ... }

public int timeMicroBenchmark(int reps) {
    int ret = 0;
    for (int i = 0; i < reps; i++) {
        ret += macroBenchmark();
    }
}

to share the benchmarking code across both scenarios. An alternate code would be to use

@Macrobenchmark
public int macroBenchmark() {
    return timeMicroBenchmark(1);
}

public int timeMicroBenchmark(int reps) { ... }

which of the two "adapters" is preferrable? Any other hints for getting consistent benchmarking from micro all the way to macro?

Given that the caliper WebUI is currenty not functional, what do you use for analyzing the results? I'm currently using a tiny python script to process the JSON result and report weighted means. And in fact, I liked the old text reporting better than the web UI.

Oh, and is there a way to have Caliper just re-run a benchmark when Hotspot compilation occurred in the benchmarking loop? Right now it logs an error, but maybe it could just re-start that part of the benchmark?

Possibly too late for you but there is a new (it was pushed a few weeks ago) benchmarking tool part of openJDK: http://openjdk.java.net/projects/code-tools/jmh/ which is supposed to adress some of the issues encountered with caliper. Created by some performance engineers at Oracle. Disclaimer: I have not tried it yet but it seems promising. — assylias, Apr 05 '13 at 12:47
Thank you. JMH looks promising. Also because it seems to be less web-UI-oriented than caliper. Plus, it doesn't pull in as many dependencies. One thing I really hate about many current Java projects is the massive chains of dependencies you get everywhere. Caliper is not an exception to this, depending on hibernate for example, pulling in yet another 20 dependencies at least. Maybe I will give it a try - the sooner the better. I havn't started updating my heap benchmarks to caliper 1.0 yet, for example. These will also need much more complicated workloads than sorting. — Erich Schubert, Apr 05 '13 at 13:37
RE: web UI. A version compatible with HEAD is being pushed today. :-) — gk5885, Apr 09 '13 at 15:21
RE: Hotspot compilation. I would like to introduce that sort of behavior, but there were some concerns with the migration path for people using Caliper 0.5 and the difference in behavior. The interim solution was just to let people know. A bug is now being tracked. https://code.google.com/p/caliper/issues/detail?id=238 — gk5885, Apr 09 '13 at 15:26
@gk5885 i'll have a look at the webui push when I'm back from PAKDD2013. as for rerunning on hotspot compilation: is there actually an upgrade path from 0.5? I had to change quite a lot from December 1.0 pre to the current version. either way, "restart run on hotspot compilation" could be made a command line flag, I guess. — Erich Schubert, Apr 12 '13 at 13:43
@assylias what issues does jmh do better than caliper? to me it actually seems that caliper use quite a bit more advanced (although a bit too fat for my liking) — Erich Schubert, Apr 12 '13 at 13:46
@ErichSchubert I don't know it well enough to judge, but I cam across a few comments in the sample code, see for example: http://hg.openjdk.java.net/code-tools/jmh/file/7bb7fe12eb14/jmh-samples/src/main/java/org/openjdk/jmh/samples/JMHSample_11_Loops.java, in particular the comment at the bottom. — assylias, Apr 12 '13 at 15:14
@ErichSchubert I don't think there should be too much that changed for the benchmark, but running the command is a bit different. https://code.google.com/p/caliper/wiki/CommandLineOptions was recently updated to reflect the new options, so hopefully that should help. — gk5885, Apr 17 '13 at 18:07
@assylias FWIW, we've seen that benchmark and its comment and are a bit suspicious for a variety of reasons. The methodology that Caliper uses is the synthesis of expertise and advice from many of the long-time Java library and platform developers. That said, there are interesting aspects to jmh that are worth investigating and some comparative analysis to perform, but sadly I wasn't abe to get it to build. — gk5885, Apr 17 '13 at 18:16
@gk5885 That would be interesting indeed - you can also post on their mailing list to engage a discussion. FYI, I managed to build it fairly painlessly on Netbeans/Windows 7 as a Maven project. — assylias, Apr 17 '13 at 18:21

score 6 · Answer 1 · answered Apr 09 '13 at 15:21

I think the issue is that the output from the microbenchmark instrument is being misinterpreted as a "complaint". It says:

"INFO: This experiment does not require a microbenchmark. The granularity of the timer (%s) is less than 0.1%% of the measured runtime. If all experiments for this benchmark have runtimes greater than %s, consider the macrobenchmark instrument."

The message is specifically worded to convey that an individual experiment was lengthy, but since other experiments for that benchmark method may not be, it's certainly not an error. There is a bit more overhead to the microbenchmark instrument, but while your experiment may not require a microbenchmark, the results are still perfectly valid.

Caliper: micro- and macro benchmarks

1 Answers1