Haskell: `Map (a,b) c` versus `Map a (Map b c)`?

Question

Thinking of maps as representations of finite functions, a map of two or more variables can be given either in curried or uncurried form; that is, the types Map (a,b) c and Map a (Map b c) are isomorphic, or something close to it.

What practical considerations are there — efficiency, etc — for choosing between the two representations?

I think `Map (a, b) c` is likely to be much more memory (and possibly time) efficient. If there is a way (I'm not sure, haven't used maps much) to fold over a prefix key range then you could still perform something like curried application efficiently with this representation I think. — , May 17 '13 at 15:45

C. A. McCann · Accepted Answer · 2013-05-17T16:51:49.283

17

The Ord instance of tuples uses lexicographic order, so Map (a, b) c is going to sort by a first anyway, so the overall order will be the same. Regarding practical considerations:

Because Data.Map is a binary search tree splitting at a key is comparable to a lookup, so getting a submap for a given a in the uncurried form won't be significantly more expensive than in the curried form.
The curried form may produce a less balanced tree overall, for the obvious reason of having multiple trees instead of just one.
The curried form will have a bit of extra overhead to store the nested maps.
The nested maps of the curried form representing "partial applications" can be shared if some a values produce the same result.
Similarly, "partial application" of the curried form gives you the existing inner map, while the uncurried form must construct a new map.

So the uncurried form is ~~clearly better in general~~, but the curried form may be better if you expect to do "partial application" often and would benefit from sharing of Map b c values.

Note that some care will be necessary to ensure you actually benefit from that potential sharing; you'll need to explicitly define any shared inner maps and reuse the single value when constructing the full map.

Edit: Tikhon Jelvis points out in the comments that the memory overhead of the tuple constructors--which I did not think to account for--is not at all negligible. There is certainly some overhead to the curried form, but that overhead is proportional to how many distinct a values there are. The tuple constructor overhead in the uncurried form, on the other hand, is proportional to the total number of keys.

So if, on average, for any given value of a there are three or more distinct keys using it you'll probably save memory using the curried version. The concerns about unbalanced trees still apply, of course. The more I think about it, the more I suspect the curried form is unequivocally better except perhaps if your keys are very sparse and unevenly distributed.

Note that because arity of definitions does matter to GHC, the same care is required when defining functions if you want subexpressions to be shared; this is one reason you sometimes see functions defined in a style like this:

foo x = go
  where z = expensiveComputation x
        go y = doStuff y z

edited May 17 '13 at 16:51

answered May 17 '13 at 16:02

C. A. McCann

76,893
19
209
302

1

+1, but re: the first bullet point, wouldn't getting a submap require worst-case linear time in the uncurried version vs. logarithmic in the curried version? Or does lazy evaluation prevent that? – Fred Foo May 17 '13 at 16:13
@larsmans: Lazy evaluation prevents it from being simple to determine what "worst case" means. :] You only pay for the expensive computation if you do something that forces it, which is often something expensive anyway. That said, I believe you are correct, but that it would probably require deliberately pathological data and access patterns to see that worst-case in practice. – C. A. McCann May 17 '13 at 16:26
I was thinking of getting the `Map b c` out followed by an O(n) or greater sequence of accesses, but I didn't realize that in that case the map construction's cost is dominated by the actual accesses. – Fred Foo May 17 '13 at 16:29
4

I'm not sure the curried form will necessarily take more memory than the normal one. From [this](www.haskell.org/haskellwiki/GHC/Memory_Footprint) table, it seems that the curried version will have 6 extra words per unique `a` key where the uncurried version will have 3 extra words per `a, b` pair to store the tuple. If you don't have too many `a`s, I think the curried version might be *more* memory efficient. – Tikhon Jelvis May 17 '13 at 16:34
@larsmans: For a simpler example, consider the time complexity of `(++)`. Ostensibly, it should be O(N) in the length of the first argument, but to see the full cost requires traversing N elements of the result, which is O(N) for even a fully evaluated list. In practical terms, it often makes sense to "amortize" the cost of `(++)` over the intrinsic cost of the sequential accesses that force it, giving it a net time complexity of O(1). – C. A. McCann May 17 '13 at 16:39
@TikhonJelvis: Oh, excellent point! I've updated the answer to mention that. – C. A. McCann May 17 '13 at 16:53
2

@larsmans And no such comment would be complete without mentioning the first few chapters of [Chris Okasaki's "Purely Functional Data Structures"](http://www.cs.cmu.edu/~rwh/theses/okasaki.pdf) – J. Abrahamson May 17 '13 at 17:10
2

@C.A.McCann Laziness is somewhat lacking in the key department of a `Map`. The current form lets the nested map be lazier than it would otherwise be as part of the key in the containing map this is both good and bad. If you accumulate a lot of edits to the contained maps without forcing them then you can leak more memory in the curried case, but in the uncurried form you have to pay for unnecessary tuples and can't query for curried subtrees nearly as efficiently. I tend towards currying the map especially when I want to be able to exploit having an outer map exist and the nested empty ones. – Edward Kmett May 18 '13 at 20:36

score 4 · Answer 2 · answered May 17 '13 at 18:34

4

Tuples are lazy in both elements, so the tuple version introduces a little extra laziness. Whether this is good or bad strongly depends on your usage. (In particular, comparisons may force the tuple elements, but only if there are lots of duplicate a values.)

Beyond that, I think it's going to depend on how many duplicates you have. If a is almost always different whenever b is, you're going to have a lot of small trees, so the tuple version might be better. On the other hand, if the opposite is true, the non-tuple version may save you a little time (not constantly recomparing a once you've found the appropriate subtree and you're looking for b).

I'm reminded of tries, and how they store common prefixes once. The non-tuple version seems to be a bit like that. A trie can be more efficient than a BST if there's lots of common prefixes, and less efficient if there aren't.

But the bottom line: benchmark it!! ;-)

answered May 17 '13 at 18:34

MathematicalOrchid

61,854
19
123
220

1

+1 I think like you. The uncurried form could also be faster if many searches are done that already fail for a missing a *and* the number of unique curried keys (a,b) is much greater than the number of unique a's. – Ingo May 17 '13 at 18:47
It won't actually be lazy, since it'll be forced by key comparisons as soon as you go to put it into the tree, and in general the `Map` combinators are (somewhat unnecessarily) strict in the key regardless. – Edward Kmett May 18 '13 at 20:31
(You will however be forced to pay for the extra check because GHC won't be smart enough to know the sides of the tuple have already been forced by the first comparison, and only the outer `(,)` would be forced by inserting into an empty `Map`) – Edward Kmett May 18 '13 at 20:38

score 3 · Answer 3 · answered May 21 '13 at 09:24

Apart from the efficiency aspects, there's also a pragmatic side to this question: what do you want to do with this structure?

Do you, for instance, want to be able to store an empty map for a given value of type a? If so, then the uncurried version might be more practical!

Here's a simple example: let's say we want to store String-valued properties of persons - say the value of some fields on that person's stackoverflow profile page.

type Person = String
type Property = String

uncurriedMap :: Map Person (Map Property String)
uncurriedMap = fromList [
                   ("yatima2975", fromList [("location","Utrecht"),("age","37")]),
                   ("PLL", fromList []) ]
curriedMap :: Map (Person,Property) String
curriedMap = fromList [
                 (("yatima2975","location"), "Utrecht"),
                 (("yatima2975","age"), "37") ]

With the curried version, there is no nice way to record the fact that user "PLL" is known to the system, but hasn't filled in any information. A person/property pair ("PLL",undefined) is going to cause runtime crashes, since Map is strict in the keys.

You could change the type of curriedMap to Map (Person,Property) (Maybe String) and store Nothings in there, and that might very well be the best solution in this case; but where there's a unknown/varying number of properties (e.g. depending on the kind of Person) that will also run into difficulties.

So, I guess it also depends on whether you need a query function like this:

data QueryResult = PersonUnknown | PropertyUnknownForPerson | Value String
query :: Person -> Property -> Map (Person, Property) String -> QueryResult

This is hard to write (if not impossible) in the curried version, but easy in the uncurried version.

Haskell: `Map (a,b) c` versus `Map a (Map b c)`?

3 Answers3