What statistical distribution is used to benchmark an algorithm?

Question

I have benchmarked my algorithm, it run for 1000 times. Now I have all time values and at this point it would be interesting to know the mean, standard deviation, median. The problem is that I don't know what is correct statistics to use to estimate these parameters. I'm not sure about using Normal distribution.

score 1 · Answer 1 · edited May 23 '17 at 12:32

Learn about statistics. There are lots of books, guides, papers and introductions out there (1,2,3, 4)
There are also lots of libraries which implements default statistical methods:

Java Commons Math,
C++ Libs,
and there are certainly lots of others for the language you use...

And also one last hint: For a quick (initial) result I often use excel and its diagram functions. It supports some statistical methods with which you can play around a bit to see in which direction you may continue....

score 0 · Answer 2 · answered Aug 11 '14 at 12:16

That really depends on what distribution your workload experiences, so you would not be able to answer generically to this.

But there is a trick: if you go one step forward, and do several iterations, each consisting of N calls, and compute, say, average time/throughput for the entire iteration. Then, for a large N and consistent workload behavior across the calls, the iteration scores may be subject to Central Limit Theorem, which can turn them into normally distributed.

What statistical distribution is used to benchmark an algorithm?

2 Answers2