27

I think that Haskell is a beautiful language, and judging by the benchmarks, its implementations can generate fast code.

However, I am wondering if it is appropriate for long-running applications, or would chasing all the potential laziness-induced leaks, that one might ignore in a short-lived application, prove frustrating?

This Reddit comment echos my concerns:

As soon as you have more than one function calling itself recursively, the heap profile ceases to give you any help pinpointing where the leak is occurring.

(That whole discussion seems insightful and frank)

I am personally interested in high-performance computing, but I guess servers and HPC have this requirement in common.

If Haskell is appropriate for such applications, are there any examples proving this point, that is applications that

  1. need to run for days or weeks, therefore requiring the elimination of all relevant leaks (The time the program spends sleeping or waiting for some underlying C library to return obviously doesn't count)
  2. are non-trivial (If the application is simple, the developer could just guess the source of the leak and attempt various fixes. However, I don't believe this approach scales well. The helpfulness of the heap profile in identifying the source of the leak(s) with multiple [mutually] recursive functions seems to be of particular concern, as per the Reddit discussion above)

If Haskell is not appropriate for such applications, then why?

Update: The Yesod web server framework for Haskell, that was put forth as an example, may have issues with memory. I wonder if anyone tested its memory usage after serving requests continuously for days.

MWB
  • 11,740
  • 6
  • 46
  • 91
  • 2
    Looks a bit the same as whether a system with a garbage collector is appropriate: because of the `gc` people normally don't destroy objects that are no longer necessary: they count that the gc will find them eventually. But this can result in a large number of heap objects that are only active because a reference is not set to `null` making all these objects garbage. – Willem Van Onsem May 26 '15 at 22:44
  • 6
    Laziness does not mean space leaks, just as strictness doesn't. There are different techniques for managing both kinds of memory models. How you write your application determines if your application will be able to run for long periods of time. I know [Facebook is using Haskell](https://github.com/facebook/Haxl) as a middle layer between multiple data stores and some of their frontend services, but I don't know whether those are short lived processes. My guess is that they would need to be long running, so if that's the case you would have a pretty solid example right there. – bheklilr May 26 '15 at 22:47
  • @bheklilr: I don't think MaxB is referring to space leaks: Haskell manages memory correctly (or should from a theoretical pov), but it can take ages before dead objects are recycled. – Willem Van Onsem May 26 '15 at 22:49
  • 3
    @MaxB, you can't really "delete all garbage" in gc languages. We're talking about forgetting to set certain references to `null`, which is quite similar to not evaluating certain expressions because of what they refer to. However, it can indeed be quite difficult to reason about memory in Haskell programs compared to their imperative counterparts. You can design your persistent data structures in a way to guarantee they hold no unevaluated thunks -- if I were writing a largish system I would probably do that. It does limit your expressivity, but also provides a checkpoint for memory usage. – luqui May 26 '15 at 22:50
  • 1
    Read this : http://engineering.imvu.com/2014/03/24/what-its-like-to-use-haskell/ . It seems that Haskell works pretty well fort long running services but space leaks can be harder to find (though tooling is improving so I don't know how hard it now is). – Jedai May 27 '15 at 12:14
  • @MaxB, If you're happy with any of the below answers, you should probably accept one of them. – AJF May 28 '15 at 17:05

5 Answers5

16

"Space leaks" are semantically identical to any other kind of resource use problem in any language. In strict languages the GC tends to allocate and retain too much data (as structures are strict).

No matter the language you should be doing some "burn in" to look for resource usage over time, and Haskell is no different.

See e.g. xmonad, which runs for months or years at a time. It's a Haskell app, has a tiny heap use, and I tested it by running for weeks or months with profiling on to analyze heap patterns. This gives me confidence that resource use is stable.

Ultimately though, laziness is a red herring here. Use the resource monitoring tools and testing to measure and validate your resource expectations.

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
  • 2
    `xmonad` has very low complexity (< 1KLOC). It's unclear how chasing leaks by looking at the profiler would scale, and doesn't `xmonad` sleep 99.9% of the time? (What does your `top` say?) Is `xmonad` really the best example of the use of Haskell in this type of application? – MWB Jun 02 '15 at 09:54
  • The core of xmonad is < 1K, and is very similar to the core of a webserver. If you have different requirements for "long running" please specify what you mean. – Don Stewart Jun 05 '15 at 11:08
  • 1
    I just clarified the requirements in the question. – MWB Jun 05 '15 at 12:18
8

The warp web server proves that Haskell is appropriate for long-running applications.

When Haskell applications have space leaks, it can be difficult to track down the cause, but once the cause is known it's usually trivial to resolve (the hardest fix I've ever had to use was to apply zip [1..] to a list and get the length from the last element instead of using the length function). But space leaks are actually very rare in Haskell programs. It's generally harder to deliberately create a space leak than it is to fix an accidental one.

Jeremy List
  • 1,756
  • 9
  • 16
  • `The warp web server proves...` Are there any busy web sites that use it? – MWB May 27 '15 at 07:10
  • 3
    https://github.com/yesodweb/yesod/wiki/Powered-by-Yesod has an incomplete list of websites which use the Yesod framework (which would be difficult to use with another web server). None seem to be all that busy, but it's hard to tell sometimes. On the other hand: in benchmarks warp handles more requests per second than nginx except on single core servers. On 10 core servers: warp is 5 times faster than nginx. – Jeremy List May 27 '15 at 10:00
8

It is. There are 2 kinds of possible space leaks:

Data on the heap. Here the situation is no different from other languages that use GC. (And for the ones that don't the situation is usually worse - if there is an error, instead of increasing memory usage, the process might touch freed memory or vice versa and just crash badly.)

Unevaluated thunks. Admittedly, one can shoot oneself in the foot, one must of course avoid well-known situations that produce large thunks like foldl (+) 0. But it's not difficult to prevent that, and for other leaks I'd say that it's actually easier than in other languages, when you get used to it.

Either you have a long-running, heavy computation, or a service that responds to requests. If you have a long-running computation, you usually need results immediately as you compute them, which forces their evaluation.

And if you have a service, its state is usually well-contained so it's easy to make sure it's always evaluated at the end of a request. In fact, Haskell makes this easier compared to other languages: In Haskell, you can't have components of your program keep their own internal state. The application's global state is either threaded as arguments in some kind of a main loop, or is stored using IO. And since a good design of an Haskell application limits and localizes IO as much as possible, it again makes the state easy to control.


As another example the Ganeti project (of which I'm a developer) uses several Haskell long-running daemons.

From our experience, memory leaks have been very rare, if we had problems, it was usually with other resources (like file descriptors). The only somewhat recent case I recall was the monitoring daemon leaking memory as thunks in the rare case where it collected data, but nobody looked at them (which would force their evaluation). The fix was rather simple.

Petr
  • 62,528
  • 13
  • 153
  • 317
6

Most long-running apps are request driven. For example HTTP servers associate all transient data with an HTTP request. After the request ends the data is thrown away. So at least for those kinds of long-running apps any language will not have space leaks. Leak all you want in the context of a single request. As long as you do not create global references to per-request data you will not leak.

All bets are off if you mutate global state. That is to be avoided for many reasons, and it is uncommon in such apps.

usr
  • 168,620
  • 35
  • 240
  • 369
  • Not a web programmer, but I think most HTTP servers need to retain some information after serving a request: logging, new content (like on this site), items left in stock, etc. – MWB May 28 '15 at 02:15
  • @Carsten, so maybe we're implementing a database or some such thing. – luqui May 28 '15 at 23:23
  • @luqui ??? well seems I'm the strange kind of kid you don't wanna play with here - so ok I'll shut up already – Random Dev May 29 '15 at 04:09
  • I think you're hearing my frustration with defining away the problem in order to self-gratify, which seems to happen when people bring up problems with the language we love... – luqui May 29 '15 at 04:13
3

I have a service written in haskell that works for months without any haskell-specific issue. There was a period when it worked 6 months without any attention, but then I restarted it to apply updated. It contains a stateless HTTP API, but also it has statefull websockets interface, so it maintains long living state. Its sources are closed, so I can't provide a link, but it my experience haskell works fine for long-running applications.

Laziness is not an issue for me, but that is because I know how to deal with it. It is not hard, but requires some experience.

Also libraries on hackage have different quality, and keeping dependencies under control is an important thing. I try to avoid dependencies unless they are really necessary, and I inspect most of their code (except a number of widely used packages, most of them are either core libraries or part of Haskell Platform, though I inspect their code too -- just to learn new things.)

Though there are corner cases when GHC (the most widely used implementation) doesn't work well enough. I had issues with GC time when an application maintains a huge (mostly readonly) state in memory (there is a ticket.) Also a lot of stable pointers can be problematic (the ticket, though I never experienced it myself.) Most of the time such the corner cases are easy to avoid by careful design.

Actually application design is the most important thing for long-running applications. Implementation language plays less important role. Probably it is the biggest lesson I leaned the last few years -- software design is very important and it is not too different between languages.

Yuras
  • 13,856
  • 1
  • 45
  • 58