Standard (simple?) benchmark code/test?

Question

Is there some kind of standard benchmarking system or outline or something? I am looking at go, llvm, d and other languages and i wanted to know how they fair in execution time, memory usage, etc.

I found https://benchmarksgame-team.pages.debian.net/benchmarksgame/ but the code is NOT THE SAME. One example is a C++ source is < 100 lines while the C source is >650. I hardly call that fair. Another test in its source has the stupid mistake of putting a lock inside the loop while other languages put it outside.

So i wanted to know some test i might consider looking at/running that perhaps uses no nonstandard or even complex libs. Like implemented completely inside a single source file. Something fair.

Do you think the appropriate way to write something in Go, would use exactly the same code as the appropriate way to write something in D? You want to compare different languages - the code *will* be different. — igouy, Jun 12 '11 at 00:51
Related: [Resources containing cross-language benchmarks?](https://stackoverflow.com/q/6091572) — Peter Cordes, Jun 05 '18 at 23:57

igouy · Answer 1 · 2018-06-05T20:21:57.300

For several years the benchmarks game website featured this on the Help page -

What does "not fair" mean? (A fable)

They raced up, and down, and around and around and around, and forwards and backwards and sideways and upside-down.

Cheetah's friends said "it's not fair" - everyone knows Cheetah is the fastest creature but the races are too long and Cheetah gets tired!

Falcon's friends said "it's not fair" - everyone knows Falcon is the fastest creature but Falcon doesn't walk very well, he soars across the sky!

Horse's friends said "it's not fair" - everyone knows Horse is the fastest creature but this is only a yearling, you must stop the races until a stallion takes part!

Man's friends said "it's not fair" - everyone knows that in the "real world" Man would use a motorbike, you must wait until Man has fueled and warmed up the engine!

Snail's friends said "it's not fair" - everyone knows that a creature should leave a slime trail, all those other creatures are cheating!

Dalmatian's tail was banging on the ground. Dalmatian panted and between breaths said "Look at that beautiful mountain, let's race to the top!"

At that time "it's not fair" comments were mostly special pleading intended to gain an advantage for programming language X to the disadvantage of programming language Y.

But the issues your question raises are a little different.

Firstly, look at the n-body programs on the benchmarks game website. Even though the programs are written in different languages there's very little difference in the way the programs are coded.

So far no one has found an effective way to make use of quad-core for this small n-body problem - so there are no special multi-core programs. The programs do not use non-standard or complex libraries. The programs are completely implemented inside a single source file.
I said there's very little difference in the way the n-body programs are coded but does that really mean the programs are the same? Soon after the project had been revived, 6 or 7 years ago I remember an Ada programmer half-joked about comparing apples to oranges because the assembly language from the Ada programs wasn't the same as the assembly language from the C programs - so obviously like wasn't being compared to like :-)
- otoh the Ada source code would have to be written in a different way than the C source code was written, to make the Ada compiler produce the same assembly language as the C compiler produced.
- otoh if the assembly language produced by both compilers really was line-by-line the same, why would there be a performance difference?
When there's very little difference in the way the programs are coded then at first glance the comparison appears to be fair, but forcing different languages to be coded like language X may favour language X.
As Yannick Versley noted, the point of using a different language is for the different approaches that language provides. In other words, there's more than one way to do the same thing.

Look at the mandelbrot programs on the benchmarks game website - the simplest C program is half the size of the fastest C program; the simplest C program is sequential and uses doubles, the fastest C program uses all 4 cores through OMP and GCC intrinsics.
- Other languages take different approaches to use all 4 cores - does that mean we should only compare sequential programs and ignore the reality of multi-core computing?
- Other language implementations may not provide an equivalent to GCC intrinsics - does that mean we should only compare programs that use doubles? But other language implementations take different approaches in the way they represent doubles - does that mean we should ignore all floating point programs?

The problem is that programming languages (and programming language implementations) are more different than apples to oranges, but we still ask - Will my program be faster if I write it in language X? - and still wish for a simpler answer than - It depends how you write it!

The different tasks and different programs on the benchmarks game website show that some of the performance comparison answers are confusing and complicated - the details matter, a lot.

score 5 · Accepted Answer · answered Jun 11 '11 at 23:45

Benchmarking is not entirely about being fair - it's about choosing something for your own workload, within your restraints.

If you want to use the alioth shootout site, you can still get interesting information if you exclude solutions that are too verbose, or too slow (the exact balancing depends on what you want to do - do you write code that runs for five seconds, or one that will occupy a dozen computers for five months). Look at the most concise examples for one particular problem to see the general problem structure - then see what typical optimizations people applied to make the code run faster.

Having a benchmark with THE SAME code misses the point, because you need different things to help in different languages; Java has GC, which means that it will do well on the trees test, whereas you need custom memory allocation in C/C++ to compete with that (and that particular benchmark is structured so that standard malloc does really poorly), for the spectral-norm one, you need non-boxed double arrays...

If you want to come up with your own solutions, have a go at Project Euler - there are a lot of problems that do not depend on complex libraries, yet are challenging to optimize. Otherwise, try to come up with scoring criteria that you consider adequate to filter or rank the existing contributions in the shootout (or outside it - for example, there are ShedSkin and Cython solutions to some of the problems, which are "unofficial" because these languages are not included).

>>custom memory allocation in C/C++ to compete with that<< The Apache Portable Runtime memory pool and the Boost object pool seem quite effective ;-) http://shootout.alioth.debian.org/u64q/performance.php?test=binarytrees — igouy, Jun 12 '11 at 01:01
>>that particular benchmark is structured so that standard malloc does really poorly<< Not deliberately, but perhaps one day someone will contribute a program that uses TCMalloc. — igouy, Jun 12 '11 at 01:03
I'd count glibc's obstacks, the APR pools and Boost's object pool as "custom memory allocation" because you have to know something about the lifetimes of objects. Because of this, they are faster than either a GC (which still needs to mark objects) or a faster malloc (which needs to free objects one by one, even if TCMalloc can avoid some of the work that glibc malloc does) — Yannick Versley, Jun 12 '11 at 01:35
And with GC we'll usually to custom tune GC parameters, and sometimes make explicit when GC should take place to prevent the costs from being apparent. — igouy, Jun 12 '11 at 20:06

Standard (simple?) benchmark code/test?

2 Answers2

What does "not fair" mean? (A fable)