Comparing Haskell threads to kernel threads - is my benchmark viable?

Question

This one is actually for my university project. In my essay, I need to inlcude evidence that Haskell threads are faster to create than plain kernel threads. I know that it's better to refer to some research paper, but the point is that I have to do the benchmarking myself.

Here is what I've come up with. I've written two programs, in C (using pthreads) and Haskell, which create many threads, but those threads do absolutely nothing. I need to measure only the speed of creating a thread.

Here's the source code for the C program:

#include <stdio.h>
#include <pthread.h>
#include <stdlib.h>

void* thread_main(void*);

int main(int argc, char* argv[])
{
   int n,i;
    pthread_t *threads;
    pthread_attr_t pthread_custom_attr;

    if (argc != 2)
    {
        printf ("Usage: %s n\n  where n is no. of threads\n",argv[0]);
       return 1;
   }

    n=atoi(argv[1]);

    threads=(pthread_t *)malloc(n*sizeof(*threads));
    pthread_attr_init(&pthread_custom_attr);

    for (i=0; i<n; i++)
    {
        pthread_create(&threads[i], &pthread_custom_attr, thread_main, (void *)(0));
    }

    for (i=0; i<n; i++)
    {
        pthread_join(threads[i],NULL);
    }
}

void* thread_main(void* p)
{
   return 0;
}

and for the Haskell program:

module Main (main) where

import System.IO.Unsafe
import System
import Control.Concurrent
import Control.Exception

children :: MVar [MVar ()]
children = unsafePerformIO (newMVar [])

waitForChildren :: IO ()
waitForChildren = do
   cs <- takeMVar children
   case cs of
      []   -> return ()
      m:ms -> do
         putMVar children ms
         takeMVar m
         waitForChildren

forkChild :: IO () -> IO ThreadId
forkChild io = do
   mvar <- newEmptyMVar
   childs <- takeMVar children
   putMVar children (mvar:childs)
   forkIO (io `finally` putMVar mvar ())

forkKids :: Int -> IO ()
forkKids 0 = return ()
forkKids n = do
   forkChild (threadMain)
   forkKids (n-1)

threadMain = return ()

main = do
   args <- getArgs
   forkKids (read (head args))
   waitForChildren

Now, what I do is I run each program with the same argument (e.g. 10000) and measure their running time with time -f%e, then take the arithmetic mean of the running times. It shows that creating Haskell threads in an order of magnitude faster.

Now, my question is: is this a correct benchmark? or is there some factor that I need to take into account to get accurate results?

Thanks

You can see the results here: http://10098.arnet.am/posts/haskells-lightweight-threads/ — , Jun 17 '11 at 18:43
Note: you should add a `{-# NOINLINE children #-}` when using `unsafePerformIO` to create a global MVar like that. Otherwise, it may get inlined and you end up with multiple different MVars. — hammar, Oct 29 '12 at 01:13

jalf · Accepted Answer · 2011-05-08T13:59:14.433

4

Your benchmarks are probably getting you the result you want, but there's an awful lot of noise. What you're measuring is not "how long does it take to create a thread", but "how long does it take to launch and run a program which creates a number of threads, and then waits for them to return before terminating".

The answers are probably more or less the same in practice, but when benchmarking, you should try to narrow it down so you benchmark that which you're interested in, with as little external noise as possible.

Why don't you simply slap a timer around the pthread_create/forkIO calls, since they're what you want to measure?

You're not interested in how long it takes to launch your program, so don't time that. You're not interested in how long it takes to join the threads afterwards, so don't time that.

edited May 08 '11 at 13:59

answered May 08 '11 at 09:31

jalf

243,077
51
345
550

While you're correct that he said he's not interested in how long it takes to join the threads, I think that's also potentially quite interesting. Possibly worthy of additional note in what he's writing up. – Carl May 08 '11 at 19:12
True, but many things are interesting. If it is interesting enough to measure, then it should be measured properly, in isolation (as far as possible). That is, time the cost of creating a thread, and then separately, time the cost of joining threads. – jalf May 09 '11 at 12:16

score 0 · Answer 2 · edited Oct 29 '12 at 00:41

0

Depending on the number of threads, pthread_create() might stop creating threads at some point; this should be paid attention to at benchmarking.

edited Oct 29 '12 at 00:41

Luca

9,259
5
46
59

answered Feb 15 '12 at 15:18

Hamo

1

Comparing Haskell threads to kernel threads - is my benchmark viable?

2 Answers2

Linked