27

The best way to do it would be to get the representation of the function (if it can be recovered somehow). Binary serialization is preferred for efficiency reasons.

I think there is a way to do it in Clean, because it would be impossible to implement iTask, which relies on that tasks (and so functions) can be saved and continued when the server is running again.

This must be important for distributed haskell computations.

I'm not looking for parsing haskell code at runtime as described here: Serialization of functions in Haskell. I also need to serialize not just deserialize.

Community
  • 1
  • 1
Boldizsár Németh
  • 1,847
  • 13
  • 20
  • 1
    See also [Haskell for all: The internet of code](http://www.reddit.com/r/haskell/comments/36d12v/haskell_for_all_the_internet_of_code/) for a (theoretical) suggestion how to encode functions for sending them. – imz -- Ivan Zakharyaschev Jun 05 '15 at 00:26

5 Answers5

27

Unfortunately, it's not possible with the current ghc runtime system. Serialization of functions, and other arbitrary data, requires some low level runtime support that the ghc implementors have been reluctant to add.

Serializing functions requires that you can serialize anything, since arbitrary data (evaluated and unevaluated) can be part of a function (e.g., a partial application).

augustss
  • 22,884
  • 5
  • 56
  • 93
22

No. However, the CloudHaskell project is driving home the need for explicit closure serialization support in GHC. The closest thing CloudHaskell has to explicit closures is the distributed-static package. Another attempt is the HdpH closure representation. However, both use Template Haskell in the way Thomas describes below.

The limitation is a lack of static support in GHC, for which there is a currently unactioned GHC ticket. (Any takers?). There has been a discussion on the CloudHaskell mailing list about what static support should actually look like, but nothing has yet progressed as far as I know.

The closest anyone has come to a design and implementation is Jost Berthold, who has implemented function serialisation in Eden. See his IFL 2010 paper "Orthogonal Serialisation for Haskell". The serialisation support is baked in to the Eden runtime system. (Now available as separate library: packman. Not sure whether it can be used with GHC or needs a patched GHC as in the Eden fork...) Something similar would be needed for GHC. This is the serialisation support Eden, in the version forked from GHC 7.4:

data Serialized a = Serialized { packetSize :: Int , packetData :: ByteArray# }
serialize   :: a -> IO (Serialized a)
deserialize :: Serialized a -> IO a

So: one can serialize functions and data structures. There is a Binary instance for Serialized a, allowing you to checkpoint a long-running computation to file! (See Secion 4.1).

Support for such a simple serialization API in the GHC base libraries would surely be the Holy Grail for distributed Haskell programming. It would likely simplify the composability between the distributed Haskell flavours (CloudHaskell, MetaPar, HdpH, Eden and so on...)

Community
  • 1
  • 1
Rob Stewart
  • 1,812
  • 1
  • 12
  • 25
  • 1
    I'm surprised that CloudHaskell hasn't just bitten the bullet and added the necessary low level bits. – augustss Jul 22 '13 at 18:13
  • 5
    augustss: there's some dispute over what the correct low-level bits are! many of us are not fans of the "pass a pointer" semantics described in the original paper. your input on what "low-level bits" you think would be appropriate would be quite welcome, since I know you've worked out at least one production-ready answer to this question. – sclv Jul 22 '13 at 21:24
  • 4
    @sclv It's a bit easier with strict evaluation. Then you just transfer values, whereas with lazy evaluation you have a choice where and when the evaluation will happen. – augustss Jul 23 '13 at 14:53
  • 2
    HdpH's closures seem to be more powerful, and approaching closer to sending any computation you want (after my impression after reading http://www.macs.hw.ac.uk/~pm175/papers/Maier_Trinder_IFL2011_XT.pdf ). "Simple" closures from Cloud Haskell seem to be limited to sending a top-level function and a serialized and hence **forced** environment. So if you wish to send a partially-eval'd expr you are in trouble. HdpH introduces closure compositions, so if you write your code as compositions of closures (instead of functions), you must be able to have a "deep" grip on expr structure,isn't it true? – imz -- Ivan Zakharyaschev Feb 10 '15 at 12:42
  • 1
    A followup correction/update: What is called `mapClosure` in the cited article is now `apC` according to https://hackage.haskell.org/package/hdph-closure-0.0.1/docs/Control-Parallel-HdpH-Closure.html#g:7 ; `compClosure` is `compC`. (Cf. https://hackage.haskell.org/package/hdph-closure-0.0.1/docs/Control-Parallel-HdpH-Closure.html#g:17 .) – imz -- Ivan Zakharyaschev Feb 10 '15 at 14:54
  • 2
    A follow up. The ticket https://ghc.haskell.org/trac/ghc/ticket/7015 mentioned above is closed. And StaticPtr is now part of GHC 7.10.x. – Yogesh Sajanikar Jan 31 '16 at 10:27
  • Well, machine-independent bytecode blobs are the obvious low-level bits, aren’t they? – Evi1M4chine Dec 07 '16 at 04:49
20

Check out Cloud Haskell. It has a concept called Closure which is used to send code to be executed on remote nodes in a type safe manner.

Ankur
  • 33,367
  • 2
  • 46
  • 72
  • 15
    Notice cloud haskell doesn't actually send code, just an environment and index value that, assuming a properly setup partner, can be used to lookup the routine of interest. – Thomas M. DuBuisson Jul 22 '13 at 13:11
2

Eden probably comes closest and probably deserves a seperate answer: (De-)Serialization of unevaluated thunks is possible, see https://github.com/jberthold/packman.

Deserialization is however limited to the same program (where program is a "compilation result"). Since functions are serialized as code pointers, previously unknown functions cannot be deserialized.

Possible usage:

  • storing unevaluated work for later
  • distributing work (but no sharing of new code)
axm
  • 281
  • 2
  • 6
0

A pretty simple and practical, but maybe not as elegant solution would be to (preferably have GHC automatically) compile each function into a separate module of machine-independent bytecode, serialize that bytecode whenever serialization of that function is required, and use the dynamic-loader or plugins packages, to dynamically load them, so even previously unknown functions can be used.

Since a module notes all its dependencies, those could then be (de)serialized and loaded too. In practice, serializing index numbers and attaching an indexed list of the bytecode blobs would probably be the most efficient.

I think as long as you compile the modules yourself, this is already possible right now.

As I said, it would not be very pretty though. Not to mention the generally huge security risk of de-serializing code from insecure sources to run in an unsecured environment. :-)
(No problem if it is trustworthy, of course.)

I’m not going to code it up right here, right now though. ;-)

Evi1M4chine
  • 6,992
  • 1
  • 24
  • 18