I've been researching and benchmarking various Fibonacci algorithms recently for my own amusement and more or less by accident came up with an alternate implementation of the classic O(n) time and O(1) space dynamic programming implementation.
Consider the following two functions:
BigInt fib_dp_classic(int n) {
if (n == 0) {
return 0;
}
BigInt x = 0, y = 1, z;
for (int i = 2; i <= n; ++i) {
z = x + y;
x = y;
y = z;
}
return y;
}
and
BigInt fib_dp_mod(int n) {
BigInt x = 0, y = 1, z = 1;
for (int i = 0; i < n; ++i) {
switch (i % 3) {
case 0:
y = x + z;
break;
case 1:
z = x + y;
break;
case 2:
x = y + z;
break;
}
}
switch (n % 3) {
case 0:
return x;
break;
case 1:
return y;
break;
case 2:
return z;
break;
}
}
On my machine, calculating the millionth Fibonacci number takes 6.55s with fib_dp_classic and 2.83 seconds with fib_dp_mod, and even turning on -O3 doesn't change this too much. I don't really have any good ideas as to why the mod version is faster. Is it because the extra store instructions in the classic version are more expensive than the mod in the second? It's my understanding that the compiler should be putting all three variables in registers in both versions and computing the mod is actually fairly expensive; is this not the case?
In fact, I just put both of these through compiler explorer and both are using only registers once you turn optimizations on. Granted, this is only using ints, not the GMP-based bigints I was actually using for my benchmark. Is there some weird GMP implementation detail that might be the cause here?
Update: I even strace'd both to see if malloc() might be the culprit and fib_dp_classic uses 130 syscalls (for n=1000000) while fib_dp_mod uses 133. So still no real clues...
Update 2: Yes, the buffer copies are the culprit (as geza pointed out) and I was dumb for not realizing. Here are two alternate versions and their benchmark results:
BigInt fib_dp_move(int n) {
if (n == 0) {
return 0;
}
BigInt x = 0, y = 1, z;
for (int i = 2; i <= n; ++i) {
z = std::move(x) + y;
x = std::move(y);
y = std::move(z);
}
return y;
}
This runs in 2.84 seconds, so just about equivalent to the mod version since it eliminates the unnecessary copies.
BigInt fib_dp_swap(int n) {
if (n == 0) {
return 0;
}
BigInt x = 0, y = 1, z;
for (int i = 2; i <= n; ++i) {
z = x + y;
swap(x, y);
swap(y, z);
}
return y;
}
This (from geza) also runs in 2.84 seconds, so again about equivalent to the mod version since it eliminates the copies in basically the same way, just calling swap()
instead of using move semantics.