36

It's clear that any n-tuple can be represented by a bunch of nested 2-tuples. So why are they not the same thing in Haskell? Would this break something?

Making these types equivalent would make writing functions on tuples much easier. For example, instead of defining zip,zip2,zip3,etc., you could define only a single zip function that would work for all tuples.

Of course, you can work with nested 2-tuples, but it is ugly and there is no canonical way to perform the nesting (i.e. should we nest to the left or right?).

Mike Izbicki
  • 6,286
  • 1
  • 23
  • 53

2 Answers2

35

The type (a,b,c,d) has a different performance profile from (a,(b,(c,(d,())))). In general, indexing into an n-tuple takes O(1) while indexing into an "hlist" of n nested tuples takes O(n).

That said, you should check out Oleg's classic work on HLists. Using HLists requires extensive, and somewhat sketchy, use of type level programming. Many people find this unacceptable, and it was not available in early Haskell. Probably the best way to represent an HList today is with GADTs and DataKinds

data HList ls where
  Nil  :: HList '[]
  Cons :: x -> HList xs -> HList (x ': xs)

This give canonical nesting, and lets you write functions that work for all instances of this type. You could implement your multi way zipWith using the same techniques as used in printf. A more interesting puzzle is to generate the appropriate lenses for this type (hint: use type level naturals and type families for indexing in).

I have considered writing an HList like library that used arrays and unsafeCoerce under the hood to get tuple like performance while sticking to a generic interface. I haven't done it, but it should not be overly difficult.

EDIT: the more I think about this the more inclined I am to hack something together when I have the time. The repeated copying problem Andreas Rossberg mentions can probably be eliminated using stream fusion or similar techniques.

Philip JF
  • 28,199
  • 5
  • 70
  • 77
  • I had seen Oleg's work, and that's what got me started on this basic idea. The syntax for his library (and all the variants I've seen) are just awful to use in practice though. Also, I didn't realize nested tuples take an O(n) performance hit. Couldn't the unnesting be done by the compiler to generate O(1)? – Mike Izbicki Feb 20 '13 at 07:26
  • I assumed you were talking about the performance at runtime, not compile time. It makes sense to me that compile time performance for nested tuples would not be as good, but that doesn't seem like a big deal. – Mike Izbicki Feb 20 '13 at 07:30
  • @MikeIzbicki I was talking about runtime performance. In principle it could probably be inlined into tuples much of the time, but we don't have that kind of compiler machinery currently available. Note that because templates in C++ can be used "unboxed", the C++ implementation of tuples can be fully generic in the way you would like. GHC does not allow you to unbox polymorphic fields, so we can't do this (hence the solution being a library using arrays an unsafe coerce internally). – Philip JF Feb 20 '13 at 09:03
  • @MikeIzbicki do you find the dataKinds based types unacceptable? `a : b : c : []`? We could come up with better constructors for the value level if this is what you care about. – Philip JF Feb 20 '13 at 09:09
  • @PhilipJF The type syntax would have ticks: `a ': b': c ': '[]`, but fortunately, this can sugared as '[a,b,c], which is pretty nice! – crockeea Feb 20 '13 at 13:58
  • @PhilipJF That's a convenient syntax at the type level, but it doesn't look so clean when it comes to the value level. I had been using my own library with syntax of: `a ::: b ::: c ::: ()`. I chose this syntax so it would be the same at the type and value level. I mostly wish there was a way to get convenient syntax---something like `'(a,b,c)`---at the value level – Mike Izbicki Feb 20 '13 at 22:59
  • @MikeIzbicki what about using a function (type class hackery time) and spaces, I could probably give you `(makeHArray 1 'c' () "hello") :: Num a => HArray '[a,Char,(),String]` which looks pretty good IMO. As to pattern matching, view patterns are about as good as makes sense semantically. – Philip JF Feb 20 '13 at 23:05
  • 1
    @PhilipJF I've decided to go ahead and make a tuple type using a vector as the base, like you were talking about. I solved the problem of copying the vector every time you do a tuple-cons by making the vector extra large and using a mutable vector. Or at least I think it works so far. It's pretty shoe-string right now, so I'll try to get it into a reasonable condition this coming week and put it on github. – Mike Izbicki Feb 24 '13 at 00:01
  • 1
    For anyone reading this in the future, here's a link to the github prohect: https://github.com/mikeizbicki/vector-heterogenous#hvector – Mike Izbicki Mar 05 '13 at 17:24
23

The main problem with this in Haskell would be that a nested tuple allows additional values, due to laziness. For example, the type (a,(b,()) is inhabited by all (x,_|_) or (x,(y,_|_)), which is not the case for flat tuples. The existence of these values is not only semantically inconvenient, it also would make tuples much more difficult to optimise.

In a strict language, though, your suggestion is indeed a possibility. But it still introduces a performance pitfall: implementations would still want to flatten tuples. Consequently, in the cases where you actually construct or deconstruct them inductively, they would have to do a lot of repeated copying. When you use really large tuples, that might be a problem.

Andreas Rossberg
  • 34,518
  • 3
  • 61
  • 72
  • 3
    What if we make strictly nested tuples isomorphic to their flattened counterparts, i.e. something like: `(a,!(b,!(c,!()))) ~ (a,b,c)`? Also, couldn't all that copying be done at compile time instead of run time? – Mike Izbicki Feb 20 '13 at 07:41
  • 1
    @MikeIzbicki, AFAICT, there is no such thing as a type ``(a,!(...))`` in Haskell. You can only annotate the parameters of datatype constructors as strict. Regarding the copying, of course it's not necessary where you write down a tuple literal, but when you construct or deconstruct one *inductively*, i.e. by recursion over its length, then I see no way avoiding it. – Andreas Rossberg Feb 20 '13 at 08:01
  • 2
    I wonder if there's some way of keeping the current representation of tuples but providing an interface similar to the nested tuples. Basically, have tuples work the same way but also make it easy to write generic code over tuples with things like recursive typeclass instances. – Tikhon Jelvis Feb 20 '13 at 16:30
  • 1
    @AndreasRossberg if the representation consists of, say, a pointer to an array to host a tuple's elements, and a number that is a tuple's known size (`<=` than the actual array size which might be quite larger, or be grown with exponential resizing), the copying would only consist of copying these two fields. The actual array could be then shared, I think. – Will Ness Feb 20 '13 at 17:40
  • @WillNess, well, sure, but the cost of an extra indirection and extra allocation *everywhere* is far worse than the problem it would fix. – Andreas Rossberg Feb 20 '13 at 19:01
  • if the tuple is big, and the array is shared, it may be worth it still, allocation wise. Indirection, yes, can't be helped. – Will Ness Feb 20 '13 at 20:17
  • @AndreasRossberg Yeah, I think you're right. But there's no reason the compiler couldn't cheat and allow it for just this one type. – Mike Izbicki Feb 20 '13 at 23:02