I want to micro-benchmark predicate int_cntA/2 ...
int_cntA(I,W) :- I >= 0, int_cntA0_cntA(I,0,W).
int_cntA0_cntA(0,W0,W) :- !, W0 = W.
int_cntA0_cntA(I,W0,W) :- I0 is I/\(I-1), W1 is W0+1, int_cntA0_cntA(I0,W1,W).
... against predicate int_cntB/2:
int_cntB(I,W) :- I >= 0, int_cntB0_cntB(I,0,W).
int_cntB0_cntB(0,W0,W) :- !, W0 = W.
int_cntB0_cntB(I,W0,W) :- I0 is I>>1, W1 is W0+(I/\1), int_cntB0_cntB(I0,W1,W).
I'm not 100% sure about what I need to consider to get good results... What are even interesting dimensions?
So far, I came up with: Should I include meta-call performance into the benchmark or should it be about raw number crunching? Should the loop be failure driven or not? Should I care about the garbage generated during execution or not?
The following code snippet is a simple benchmark implementation that goes for raw performance, is failure driven and does (thus) not care about garbage:
:- use_module(library(between)).
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_10.
rep_100M :- rep_10, rep_10, rep_10, rep_10, rep_10, rep_10, rep_10, rep_10.
Code for int_cntA/2:
benchA_1(I,W,Rt) :- statistics(runtime,_),
( repeat(100000000), int_cntA(I,W), false ; true ),
statistics(runtime,[_,Rt]),
int_cntA(I,W).
benchA_2(I,W,Rt) :- statistics(runtime,_),
( between(1,100000000,_), int_cntA(I,W), false ; true ),
statistics(runtime,[_,Rt]),
int_cntA(I,W).
benchA_3(I,W,Rt) :- statistics(runtime,_),
( rep_100M, int_cntA(I,W), false ; true ),
statistics(runtime,[_,Rt]),
int_cntA(I,W).
Code for int_cntB/2:
benchB_1(I,W,Rt) :- statistics(runtime,_),
( repeat(100000000), int_cntB(I,W), false ; true ),
statistics(runtime,[_,Rt]),
int_cntB(I,W).
benchB_2(I,W,Rt) :- statistics(runtime,_),
( between(1,100000000,_), int_cntB(I,W), false ; true ),
statistics(runtime,[_,Rt]),
int_cntB(I,W).
benchB_3(I,W,Rt) :- statistics(runtime,_),
( rep_100M, int_cntB(I,W), false ; true ),
statistics(runtime,[_,Rt]),
int_cntB(I,W).
On an Intel Core i7 Haswell machine running SICStus Prolog 4.3.1 worst-case performance differences due to the different benchmarking methods (A,B,C) exceed 100%:
| ?- benchA_1(0,W,Rt).
W = 0,
Rt = 3140 ?
yes
| ?- benchA_2(0,W,Rt).
W = 0,
Rt = 4130 ?
yes
| ?- benchA_3(0,W,Rt).
W = 0,
Rt = 1960 ?
yes
Do you have ideas if/how I could further reduce the overhead of the micro-benchmark? Thank you!