18

I have a piece of code that process files,

processFiles ::  [FilePath] -> (FilePath -> IO ()) -> IO ()

This function spawns an async process that execute an IO action. This IO action must be submitted to a cluster through a job scheduling system (e.g Slurm).

Because I must use the job scheduling system, it's not possible to use cloudHaskell to distribute the closure. Instead the program writes a new Main.hs containing the desired computations, that is copy to the cluster node together with all the modules that main depends on and then it is executed remotely with "runhaskell Main.hs [opts]". Then the async process should ask periodically to the job scheduling system (using threadDelay) if the job is done.

Is there a way to avoid creating a new Main? Can I serialize the IO action and execute it somehow in the node?

felipez
  • 1,212
  • 9
  • 21
  • 1
    Andrew Cowie and Ozgun Ataman suggested me to compile the program and ship it to the nodes, since the compiled binary is self-contained and easy to rsync. One example of this is hadron[1]-based Hadoop MapReduce programs to cluster nodes at work, developed by Ozgun Ataman. [1] https://github.com/soostone/hadron – felipez Mar 31 '15 at 15:19

1 Answers1

1

Yep. There is a magical library called packman. It allows you to turn any haskell thing into data (as long as it does not have IORefs or related things in them.) Here the things you would need:

trySerialize :: a -> IO (Serialized a)
deserialize :: Serialized a -> IO a
instance Typeable a => Binary (Serialized a)

Yep, those are the exact types. You can package up your IO actions using trySerialize, use Binary to transfer it to wherever, and then deserialize to get the IO action out, ready for use.

Caveats for packman is that:

  • It stores things as thunks. This is probably what you want, so that the node can do the evaluating.
    • That said, if your thunk is huge, the Binary will probably be huge. Evaluating the thunk can fix this.
    • Like I said, mutable references are a no-no. One thing to watch out is them being inside thunks without you knowing it.

Other than that, this seems like what you want!

PyRulez
  • 10,513
  • 10
  • 42
  • 87