17

Essentially I'm curious if code like:

let myCollection = Data.SomeCollection.fromList [1, 2, foo]

is actually doing what it looks like at runtime, and creating a linked list as an intermediate step in creating a SomeCollection—or if this is just a syntactic convenience, and the compiler eschews making a list in the compiled code?

Apologies if this is a stupid question, but I've been meaning to find out ever since learning some Haskell.

J Cooper
  • 16,891
  • 12
  • 65
  • 110
  • 1
    You might want to clarify if you are asking this for `Vector` or more generally for any type. `Vector` is a special case because it probably has `Vector`-specific rewrite rules to fuse away intermediate lists. – Gabriella Gonzalez Sep 22 '14 at 01:31
  • @GabrielGonzalez thanks, I edited to clarify that I'm interested if special handling happens for any collection with a `fromList`, though if only *some* eliminate the list (perhaps `Vector` for example), that is also good to know – J Cooper Sep 22 '14 at 01:52
  • 1
    I'm afraid you'll generally have to assume there _will_ be an actual list, unless the aforementioned specialised `Vector`-optimisations or similar kick in. – leftaroundabout Sep 22 '14 at 02:05
  • An important caveat, if the list happens to be a `String`, it may not go through an intermediate list, as that particular case is often short-cut (as part of efficient `OverloadedStrings`). Now that there is an `OverloadedLists`, it's likely that future versions of data structures will be able to avoid the intermediate list. – John L Sep 22 '14 at 04:02
  • 1
    Another thing to consider here is that GHC itself has a [list fusion optimization](http://www.haskell.org/ghc/docs/latest/html/users_guide/rewrite-rules.html#idp25099648) such that, when a "good consumer" function is applied to the result of a "good producer," the runtime creation of an intermediate list is eliminated. An explicit list like `[1, 2, foo]` qualifies as a good producer. Whether a particular `fromList` function is a good consumer depends on how it's implemented; there's no general rule here, only implementation details. – Luis Casillas Sep 22 '14 at 20:18

1 Answers1

12

Short answer: Maybe, in a way, but not really...

Longer Answer:

When you say linked list you are thinking in imperative terms. Haskell is lazy and functional which makes the question hard to answer.

[a,b,c] is short hand for a:(b:(c:[])). If we fully evaluate this (for example try to print it out) what ends up in memory looks and acts a lot like a linked list in C, but with a lot more brains. But that is not usually what happens to a list in a functional setting. The way most list based functions operate is by doing something with the head of the list and sending the tail of the list off somewhere else (perhaps to the same place). That means a list really looks like x:xs where xs is just some function that will create the rest of the list. (a thunk in GHC terminology)

The whole list is not created, then processed as you would do in an imperative language. It is streamed through the fromList function one piece at a time.

At least that is the way most fromList functions work. A generic fromList for a collection might look like:

fromList    :: [a] -> Collection a
fromList    =  Data.List.foldl' insert empty

Some collections can take advantage of having more then one element added at a time. Those collections (and I don't know of any, but I know they exist) will build up a more extensive in memory list.

Outside of compiler optimizations, however, it is generally the case that

fromList [1, 2, foo]

is computationally equivalent (within a tiny constant factor) of:

empty `insert` 1 `insert` 2 `insert` foo

Disclaimer: I do not know the internals of any Haskell implementations well enough to say with absolute certainty how they evaluate a list constant in code, but this is not too far from reality. I await enlightenment from the GHC gurus if I am way off base.

Vitus
  • 11,822
  • 7
  • 37
  • 64
John F. Miller
  • 26,961
  • 10
  • 71
  • 121
  • Ah, you're quite right, I wasn't thinking about the "laziness." That makes sense, and probably does a lot to mitigate any inefficiency, assuming that chain of `insert`s isn't causing a bunch of reallocation+copying (if any collections use underlying arrays, I mean—though I think trees are the more common Haskell approach?) – J Cooper Sep 22 '14 at 05:51
  • What do you mean by "When you say linked list you are thinking in imperative terms"? – Ionuț G. Stan Sep 24 '14 at 06:11
  • In imperative languages a linked list is a data structure with a memory pointer to the next element. On its surface a CONS cell in a functional language looks similar, but the expectation that such cells will actually exist simultaneous in machine memory is not correct. Trying to conceptualize functional lists as one might implement them in Java or C leads to incorrect intuition about time ans space use of functional algorithms. – John F. Miller Sep 26 '14 at 20:48
  • @JohnF.Miller Functional programming does not mandate lazy evaluation. Linked lists in strict FP languages are pretty much the same as they are in Java or C languages. What you're saying applies to Haskell, not to FP in general. – Ionuț G. Stan Oct 03 '14 at 13:25