13

I quite like Haskell, but space leaks are a bit of a concern for me. I usually think Haskell's type system makes it safer than C++, however with a C-style loop I can be fairly certain it will complete without running out of memory, whereas a Haskell "fold" can run out of memory unless you're careful that the appropriate fields are strict.

I was wondering if there's a library that uses the Haskell type system to ensure various constructs can be compiled and run in a way that doesn't build up thunks. For example, no_thunk_fold would throw a compiler error if one was using it in a way that could build up thunks. I understand this may restrict what I can do, but I'd like a few functions I can use as an option which would make me more confident I haven't accidentally left an important unstrict field somewhere and that I'm going to run out of space.

Clinton
  • 22,361
  • 15
  • 67
  • 163
  • I don't know of any tool that does this, however you can write a compiler plugin to annotate functions that should be computed strictly: http://hackage.haskell.org/package/strict-ghc-plugin – John L Mar 13 '13 at 01:03
  • 2
    Use the fine profiler. – Don Stewart Mar 13 '13 at 10:00
  • 2
    I don't know of any. Space leaks were an issue for me when I first started learning, but I have built an intuition that allows me to see them from a long way away now. You can be fairly certain about the C-style loop because you have built an intuition about how memory works in C. The same applies to Haskell, but it is a different intuition. – luqui Mar 13 '13 at 19:04
  • the 'building up of thunks' is good! ... up to a point. I have not experienced a palpable space leak (or rather none didn't know I was courting) since I first started learning Haskell (when I was pretty good at them!) There are some nice pointers in the first 50 pages of http://www.slideshare.net/tibbe/highperformance-haskell which discusses various folds and the corresponding explicit recursions have the properties thay do. Another obvious point is to organize your code around really sound libraries, like ByteString, Text, Vector etc. – applicative Mar 14 '13 at 05:34
  • While not a compile-time check, [this](https://www.joachim-breitner.de/blog/archives/590-Evaluation-State-Assertions-in-Haskell.html) allows you to use run-time assertions. – Alex R Mar 21 '13 at 13:52

2 Answers2

10

It sounds like you are worried about some of the down sides of lazy evaluation. You want to ensure your fold, loop, recursion is handled in constant memory.

The iteratee libraries were created solve this problem, pipes, conduit, enumerator, iteratee, iterIO.

The most popular and also recent are pipes and conduit. Both of which go beyond the iteratee model.

The pipes library focuses on being theoretically sound in an effort to eliminate bugs and to allow the constancy of design open up efficient yet high levels of abstraction(my words not the authors). It also offers bidirectional streams if desired which is a benefit so far unique to the library.

The conduit is not quite as theoretically as well founded as pipes but has the large benefit of currently having more associated libraries built on it for parsing and handling http streams, xml streams and more. Check out the conduit section at hackage in on the packages page. It is used yesod one of Haskell's larger and well known web frameworks.

I have enjoyed writing my streaming applications with pipes library in particular the ability to make proxy transformer stacks. When I have needed to fetch a web page or parse some xml I have been using the conduit libraries.

I should also mention io-streams which just did its first official release. It's aim is at IO in particular, no surprise it is in its name, and utilizing simpler type machinery, fewer type parameters, then pipes or conduit. The major down side is that you are stuck in the IO monad so it is not very helpful to pure code.

{-# language NoMonoMorphismRestriction #-}                                       
import Control.Proxy

Start with simple translation.

map (+1) [1..10]

becomes:

runProxy $ mapD (+1) <-< fromListS [1..10]

The iteratee like offerings a little more verbose for simple translations, but offer large wins with larger examples.

A example of a proxy, pipes library, that generates fibonacci numbers in constant sapce

fibsP = runIdentityK $ (\a -> do respond 1                                       
                                 respond 1                                       
                                 go 1 1)                                         
  where                                                                          
    go fm2 fm1 = do  -- fm2, fm1 represents fib(n-2) and fib(n-1)                                                            
        let fn = fm2 + fm1                                                       
        respond fn -- sends fn downstream                                                              
        go fm1 fn

These could streamed to the stdout with runProxy $ fibsP >-> printD -- printD prints only the downstream values, Proxies are the bidirectional offer of the pipes package.

You should check out the proxy tutorial and the conduit tutorial which I just found out is now at FP Complete's school of Haskell.

One method to find the mean would be:

> ((_,l),s) <- (`runStateT` 0) $ (`runStateT` 0) $ runProxy $  foldlD' ( flip $ const (+1)) <-< raiseK (foldlD' (+)) <-< fromListS [1..10::Int]
> let m = (fromIntegral . getSum) s / (fromIntegral . getSum) l
5.5

Now it is easy to add map or filter the proxy.

> ((_,l),s) <- (`runStateT` 0) $ (`runStateT` 0) $ runProxy $  foldlD' ( flip $ const (+1)) <-< raiseK (foldlD' (+)) <-< filterD even <-< fromListS [1..10::Int]

edit: code rewritten to take advantage of the state monad.

update:

On more method of doing multiple calculation over a large stream of data in a compassable fashion then writing direct recursion is demonstrated in the blog post beautiful folding. Folds are turned into data and combined while using a strict accumulator. I have not used this method with any regularity, but it does seem to isolate where strictness is required making it easier to apply. You should also look at an answer to another question similar question that implements the same method with applicative and may be easier to read depending on your predilections.

Community
  • 1
  • 1
Davorak
  • 7,362
  • 1
  • 38
  • 48
  • The libraries you mentioned seem to focus on I/O. It would be good if you could provide a pure style example, perhaps working out an average (which usually requires a strict tuple or strictifying functions and can use non-constant space without a compile time error if you don't get it right). – Clinton Mar 13 '13 at 02:07
  • Only the io-streams really focuses on/is restricted to, io. I will post a pure example in a moment. – Davorak Mar 13 '13 at 02:20
  • @Clinton A few examples posted though it is hard to do them justice with short examples like these, I encourage you to check out the tutorials. – Davorak Mar 13 '13 at 02:37
  • Could you post the "average" (i.e. mean) example I mentioned earlier? The main issue I find is with constant space folds due to thunk build up (which is not really an issue for `map`). – Clinton Mar 13 '13 at 02:40
  • @Clinton I have added one method of finding the mean. – Davorak Mar 13 '13 at 03:03
  • The mean one won't run in constant space because neither Writer implementation is sufficiently strict. You would have to rewrite both to use StateT using foldlD' and then it will run in constant space. – Gabriella Gonzalez Mar 13 '13 at 04:06
  • @GabrielGonzalez: This is what I'm talking about. I'm looking to use the type system to ensure I don't get done by the gotcha you mentioned. Surely the Haskell type system is powerful enough to do so? – Clinton Mar 13 '13 at 05:23
  • @Clinton The type system might be powerful enough to do so, but I'm not aware of any library that actually does this. Usually if people want C-like performance guarantees they program in a Haskell DSL that generates C code. This is what the `atom` and `copilot` libraries do. – Gabriella Gonzalez Mar 13 '13 at 05:29
  • @GabrielGonzalez I remembered a comment about the down side of the lazy writer on my way home. – Davorak Mar 13 '13 at 05:40
  • @Clinton I may have misunderstood your question. If you are looking for a proof of performance then or asymptotic bounds then I do not know of any common libraries that offer that functionality. Libraries like pipes and conduit can help to get you part way there though. – Davorak Mar 13 '13 at 05:45
  • @Davorak: Perhaps not proof, just reasonable expectation that I'm not going to run out of memory. I'm only looking for space bounds here, not time bounds. – Clinton Mar 13 '13 at 06:51
  • @Clinton One of the main attractions of the iteratee like libraries is that it allows you to compose code similar to what you would would with an infinite lists/streams but it is easier to safe guard against leaks at least when considering equal easy of keeping the code composable. – Davorak Mar 13 '13 at 07:14
7

Haskell's type system can't do that. We can prove this with a fully polymorphic term to eat arbitrary amounts of ram.

takeArbitraryRAM :: Integer -> a -> a
takeArbitraryRAM i a = last $ go i a where
  go n x | n < 0 = [x]
  go n x | otherwise = x:go (n-1) x

To do what you want requires substructural types. Linear logic corresponds to an efficiently computable fragment of the lambda calculus (you would also need to control recursion though). Adding the structure axioms allows you to take super exponential time.

Haskell lets you fake linear types for the purposes of managing some resources using index monads. Unfortunately space and time are baked in to the language, so you can't do that for them. You can do what is suggested in a comment, and use a Haskell DSL to generate code that has performance bounds, but computing terms in this DSL could take arbitrary long and use arbitrary space.

Don't worry about space leaks. Catch them. Profile. Reason about your code to prove complexity bounds. This stuff you just have to do no matter what language you are using.

Philip JF
  • 28,199
  • 5
  • 70
  • 77
  • Could you explain why the above would take arbitrary RAM? I don't get it. – Ingo Mar 13 '13 at 10:25
  • oops. The original version didn't. Fixed. Anyways, the ideas is tt has to allocated an arbitrary long list. – Philip JF Mar 13 '13 at 19:02
  • Doesn't laziness cause the list to be created as `last` moves down it? Since there's no other reference to the head of the list, each cons will be garbage collected after `last` has moved over it no? Maybe you meant to use an accumulator... – pat Mar 13 '13 at 20:45
  • @pat an optimizing compiler could recognize that the cons cells wer only used once and free them immediately, that is true. I don't think this is part of the GHC cost model though--and certainly the general point holds. We can force arbitrary computation with a clever function--although perhaps here we need to stick a `reverse` to make that actually happen. – Philip JF Mar 13 '13 at 21:21
  • another possibility: `last $ let ls = go i a in ls ++ ls`. That should hold on to the list unless your compiler decides to reduce sharing. – Philip JF Mar 13 '13 at 21:25
  • This answer is kind of missing the point. I know you can make constructs in Haskell that do allocate arbitrary memory. C has `malloc`. Haskell I can just declare an infinite list and keep a pointer to the first element. But what I'm looking for are constructs which I can OPTIONALLY use to avoid this. For example, perhaps a version of `enumFromTo` that doesn't return a list but instead just holds what number it's up to. Now unless I'm going to intentionally copy this object around, it's only going to use constant space. – Clinton Mar 14 '13 at 00:19
  • Now, it would good to have a `fold` which has reasonable guarantees of constant space usage. Now of course if I put a list in the accumulator that's not going to happen, but I basically want to avoid the "gotchas" that leak space. If I wanted to just "profile" then I could use perl, I thought a significant benefit of Haskell is that it can catch a lot of your errors at compile time. – Clinton Mar 14 '13 at 00:21
  • 1
    @Clinton My experience, the consensus seems to be, avoiding that type of memory leak is easier with iteratee like libraries then it was with lazy lists. – Davorak Mar 15 '13 at 04:53