I took a bit of a long break from playing with Haskell, and I'm starting to get back in to it. I'm definitely still learning my way around the language. I've realized that one of the things that has always made me nervous/uncomfortable when writing Haskell is that I don't have a strong grasp on how to craft algorithms that are both idiomatic and performant. I realize that "premature optimization is the root of all evil", but similarly slow code will have to be dealt with eventually and the I just can't get rid of my preconceived notions about languages that are so high-level being super slow.
So, in that vein, I started playing with test cases. One of them that I was working on was a naïve, straight-forward implementation of the classical 4th Order Runge-Kutta method, applied to the fairly trivial IVP dy/dt = -y; y(0) = 1
, which gives y = e^-t
. I wrote a completely straight forward implementation in both Haskell and C (which I'll post in a moment). The Haskell version was incredibly succinct and gave me warm fuzzies on the inside when I looked at it, but the C version (which actually wasn't horrible to parse at all) was over twice as fast.
I realize that it isn't 100% fair to compare the performance of 2 different languages; and that until the day we all die C will most likely always hold the crown as the king of performance, especially hand-optimized C code. I'm not trying to get my Haskell implementation to run just as fast as my C implementation. But I'm pretty certain that if I was more cognizant of what I was doing then I could eek a bit more speed out of this particular Haskell implementation.
The Haskell version was compiled with -02
under GHC 7.6.3 on OS X 10.8.4, the C version was compiled with Clang and I gave it no flags. The Haskell version averaged around 0.016 seconds when tracked with time
, and the C version around 0.006 seconds.
These timings take in to account the entire running time of the binary, including output to stdout, which obviously accounts for some of the overhead, but I did do some profiling on the GHC binary by recompiling with -prof -auto-all
and running with +RTS -p
and also looking at the GC stats with +RTS -s
. I didn't really understand all that much of what I saw, but it seemed to be that my GC wasn't out of control though could probably get reined in a little bit (5%, Productivity at ~93% User, ~85% total elapsed) and that most of the productive time was spent in the function iterateRK
, which I knew would be slow when I wrote it but it wasn't immediately obvious to me how to go about cleaning it up. I realize that I'm probably incurring a penalty in my usage of a List, both in the constant cons
ing and the laziness in dumping the results to stdout.
What exactly am I doing wrong? What library functions or Monadic wizardry am I tragically unaware of that I could be using to clean up iterateRK
? What are some good resources for learning how to be a GHC profiling rockstar?
RK.hs
rk4 :: (Double -> Double -> Double) -> Double -> Double -> Double -> Double
rk4 y' h t y = y + (h/6) * (k1 + 2*k2 + 2*k3 + k4)
where k1 = y' t y
k2 = y' (t + h/2) (y + ((h/2) * k1))
k3 = y' (t + h/2) (y + ((h/2) * k2))
k4 = y' (t + h) (y + (h * k3))
iterateRK y' h t0 y0 = y0:(iterateRK y' h t1 y1)
where t1 = t0 + h
y1 = rk4 y' h t0 y0
main = do
let y' t y = -y
let h = 1e-3
let y0 = 1.0
let t0 = 0
let results = iterateRK y' h t0 y0
(putStrLn . show) (take 1000 results)
RK.c
#include<stdio.h>
#define ITERATIONS 1000
double rk4(double f(double t, double x), double h, double tn, double yn)
{
double k1, k2, k3, k4;
k1 = f(tn, yn);
k2 = f((tn + h/2), yn + (h/2 * k1));
k3 = f((tn + h/2), yn + (h/2 * k2));
k4 = f(tn + h, yn + h * k3);
return yn + (h/6) * (k1 + 2*k2 + 2*k3 + k4);
}
double expDot(double t, double x)
{
return -x;
}
int main()
{
double t0, y0, tn, yn, h, results[ITERATIONS];
int i;
h = 1e-3;
y0 = 1.0;
t0 = 0.0;
yn = y0;
for(i = 0; i < ITERATIONS; i++)
{
results[i] = yn;
yn = rk4(expDot, h, tn, yn);
tn += h;
}
for(i = 0; i < ITERATIONS; i++)
{
printf("%.10lf", results[i]);
if(i != ITERATIONS - 1)
printf(", ");
else
printf("\n");
}
return 0;
}