6

I have a closed source non-threadsafe C++ shared lib that provides one function f :: ByteString -> ByteString. The run-time of this function can be something between one second and a couple of hours.

I am looking for a way to distribute the calculation to multiple cores/servers (SIMD).

In a nutshell, I'm looking for a framework that provides a function

    g :: Strategy b -> (a -> b) -> a -> b

to lift a function that can only be called sequentially into a function that behaves like any other pure function in Haskell.

For instance, I want to be able to write:

    parMap rwhnf f args -- will not work

Since f calls a C function in a non-thread-safe lib via FFI, this will not work. Hence, I could replace the function f with a function g that holds a job queue and dispatches the tasks to N separate processes. The processes could run locally or distributed:

    parMap rwhnf g args -- should works

Potential frameworks I already looked into are

  1. MPI: Client (Haskell) <-- MPI --> Broker (C++) <-- MPI --> Worker (C++) <--> Lib (C++)

  2. ZeroMQ: Client (Haskell) <-- ZeroMQ --> Broker (C++) <-- ZeroMQ --> Worker (C++) <--> Lib (C++)

  3. Cloud Haskell: Client (Haskell) <-- CloudHaskell --> Worker (Haskell) <-- FFI --> Lib (C++)

  4. Gearman

  5. Erlang: Client (Haskell) <-- Erlang --> Broker (Erlang) <-- Erlang C Node --> Worker (C++)

Each approach has advantages and disadvantages.

  1. MPI will create a lot of security issues and is a pretty heavy-weight solution.

  2. ZeroMQ is a nice solution but would require that I write the broker/load balancer etc. all by myself (especially getting the reliability right is not trivial).

  3. CloudHaskell doesn't look very mature.

  4. Gearman doesn't run on Windows and has no Haskell bindings. I know about java-gearman-service but it is much less mature than the C daemon and has some other issues (e.g. no doc, shuts down if there is no incoming flow of tasks for some time, etc.).

  5. Similar to 1 and requires the use of a third language.

Thanks!

Chronos
  • 153
  • 4
  • You are looking into distributing a function that works on the same data to multiple cores in order to make it fail safe? If not, how can your closed source function be parallelized? – J Fritsch May 12 '12 at 20:57
  • I'm looking for a SIMD solution. Closed source means I cannot make any modifications to the lib itself to make it thread-safe. Hence, I will have to run each function call in a separate process. What I am looking for is a simple solution for load balancing / connecting the processes. In Scala I would use Akka with workers as remote nodes that run in a separate JVM. – Chronos May 12 '12 at 21:09
  • 1
    ah, so you want to calculate the function multiple times on different inputs? this isn't at all clear from your question, you might want to edit the first couple of sentences to mention it :) – Ben Millwood May 13 '12 at 01:03
  • I don't think this is a thing you can do. Wrapping something with another language _can't_ make it threadsafe. – Louis Wasserman May 16 '12 at 09:19
  • 1
    The idea is to run each instance in another process to parallize it. The question is how to glue it together, especially if the processes run on different servers. – Chronos May 17 '12 at 02:53

1 Answers1

1

Since the library you are using is not thread-safe you would like a solution based on using processes as your abstraction for parallelism. The example that you would like to see using the Par monad uses the spark or task based parallelism model where many sparks can live in the same thread. Clearly this is not what you are looking for.

Fear Not!

There are only a few paradigms in Haskell that work this way and you mentioned one of them in your post, Cloud Haskell. While Cloud Haskell is not "mature" yet it could solve your problems, but it may be a little heavyweight for your need. If you really just need to take advantage of many local cores using the process level parallel abstraction then look at the Eden library:

http://www.mathematik.uni-marburg.de/~eden/

With Eden you can absolutely express what you are after. Here is a very simple example along the lines of your Par Monad based version:

f $# args

Or in the case of many arguments you might just pull out ye olde map:

map f $# args

For more information about the $# syntax and for tutorials about Eden see:

http://www.mathematik.uni-marburg.de/~eden/paper/edenCEFP.pdf

YMMV as most of the more mature parallel paradigms in Haskell assume you have a level of thread safety or that use can do the parallel work in a pure manner.

Good Luck and Happy Hacking!

krakrjak
  • 11
  • 1