4

Consider this sequential procedure on a data structure containing collections (for simplicity, call them lists) of Doubles. For as long as I feel like, do:

  1. Select two different lists from the structure at random
  2. Calculate a statistic based on those lists
  3. Flip a coin based on that statistic
  4. Possibly modify one of the lists, based on the outcome of the coin toss

The goal is to eventually achieve convergence to something, so the 'solution' is linear in the number of iterations. An implementation of this procedure can be seen in the SO question here, and here is an intuitive visualization:

sequential vis

It seems that this procedure could be better performed - that is, convergence could be achieved faster - by using several workers executing concurrently on separate OS threads, ex:

concurrent vis

I guess a perfectly-realized implementation of this should be able to achieve a solution in O(n/P) time, for P the number of available compute resources.

Reading up on Haskell concurrency has left my head spinning with terms like MVar, TVar, TChan, acid-state, etc. What seems clear is that a concurrent implementation of this procedure would look very different from the one I linked above. But, the procedure itself seems to essentially be a pretty tame algorithm on what is essentially an in-memory database, which is a problem that I'm sure somebody has come across before.

I'm guessing I will have to use some kind of mutable, concurrent data structure that supports decent random access (that is, to random idle elements) & modification. I am getting a bit lost when I try to piece together all the things that this might require with a view towards improving performance (STM seems dubious, for example).

What data structures, concurrency concepts, etc. are suitable for this kind of task, if the goal is a performance boost over a sequential implementation?

Community
  • 1
  • 1
jtobin
  • 3,253
  • 3
  • 18
  • 27
  • I wound up using `STM`, which was less painful than `MVars`. A snapshot of some code is here: http://hpaste.org/69045. It doesn't beat my sequential implementation unless I use a very (unnecessarily) large number of iterations, but I was astounded at how easy it was to implement. – jtobin May 31 '12 at 08:49

1 Answers1

4

Keep it simple:

  • forkIO for lightweight, super-cheap threads.
  • MVar, for fast, thread safe shared memory.
  • and the appropriate sequence type (probably vector, maybe lists if you only prepend)
  • a good stats package
  • and a fast random number source (e.g. mersenne-random-pure64)

You can try the fancier stuff later. For raw performance, keep things simple first: keep the number of locks down (e.g. one per buffer); make sure to compile your code and use the threaded runtime (ghc -O2) and you should be off to a great start.

RWH has a intro chapter to cover the basics of concurrent Haskell.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
  • Thanks for the push; admittedly with all the 'fancier' stuff available, I had been kind of avoiding what are I suppose the basics of concurrency. I will likely update this question when I've gotten a satisfactory implementation going. – jtobin May 18 '12 at 01:10