This one is actually for my university project. In my essay, I need to inlcude evidence that Haskell threads are faster to create than plain kernel threads. I know that it's better to refer to some research paper, but the point is that I have to do the benchmarking myself.
Here is what I've come up with. I've written two programs, in C (using pthreads) and Haskell, which create many threads, but those threads do absolutely nothing. I need to measure only the speed of creating a thread.
Here's the source code for the C program:
#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>
void* thread_main(void*);
int main(int argc, char* argv[])
{
int n,i;
pthread_t *threads;
pthread_attr_t pthread_custom_attr;
if (argc != 2)
{
printf ("Usage: %s n\n where n is no. of threads\n",argv[0]);
return 1;
}
n=atoi(argv[1]);
threads=(pthread_t *)malloc(n*sizeof(*threads));
pthread_attr_init(&pthread_custom_attr);
for (i=0; i<n; i++)
{
pthread_create(&threads[i], &pthread_custom_attr, thread_main, (void *)(0));
}
for (i=0; i<n; i++)
{
pthread_join(threads[i],NULL);
}
}
void* thread_main(void* p)
{
return 0;
}
and for the Haskell program:
module Main (main) where
import System.IO.Unsafe
import System
import Control.Concurrent
import Control.Exception
children :: MVar [MVar ()]
children = unsafePerformIO (newMVar [])
waitForChildren :: IO ()
waitForChildren = do
cs <- takeMVar children
case cs of
[] -> return ()
m:ms -> do
putMVar children ms
takeMVar m
waitForChildren
forkChild :: IO () -> IO ThreadId
forkChild io = do
mvar <- newEmptyMVar
childs <- takeMVar children
putMVar children (mvar:childs)
forkIO (io `finally` putMVar mvar ())
forkKids :: Int -> IO ()
forkKids 0 = return ()
forkKids n = do
forkChild (threadMain)
forkKids (n-1)
threadMain = return ()
main = do
args <- getArgs
forkKids (read (head args))
waitForChildren
Now, what I do is I run each program with the same argument (e.g. 10000) and measure their running time with time -f%e
, then take the arithmetic mean of the running times. It shows that creating Haskell threads in an order of magnitude faster.
Now, my question is: is this a correct benchmark? or is there some factor that I need to take into account to get accurate results?
Thanks