Walk through a list split function in Haskell

Question

This is a follow up to my previous question.

I am trying to understand the list splitting example in Haskell from here:

foldr (\a ~(x,y) -> (a:y,x)) ([],[])

I can read Haskell and know what foldr is but don't understand this code. Could you walk me through this code and explain it in more details ?

bradrn · Accepted Answer · 2019-11-24T11:46:38.107

4

Let’s try running this function on a sample input list, say [1,2,3,4,5]:

We start with foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [1,2,3,4,5]. Here a is the first element of the list, and (x,y) start out as ([],[]), so (a:y,x) returns ([1],[]).
The next element of the input list is a = 2, and (x,y) = ([1],[]), so (a:y,x) = ([2],[1]). Note that the order of the lists has swapped. Each iteration will swap the lists again; however, the next element of the input list will always be added to the first list, which is how the splitting works.
The next element of the input list is a = 3, and (x,y) = ([2],[1]), so (a:y,x) = ([3,1],[2]).
The next element of the input list is a = 4, and (x,y) = ([3,1],[2]), so (a:y,x) = ([4,2],[3,1]).
The next element of the input list is a = 4, and (x,y) = ([4,2],[3,1]), so (a:y,x) = ([5,3,1],[4,2]).
There are no more elements left, so the return value is ([5,3,1],[4,2]).

As the walkthrough shows, the split function works by maintaining two lists, swapping them on each iteration, and appending each element of the input to a different list.

edited Nov 24 '19 at 11:46

answered Nov 22 '19 at 12:11

bradrn

8,337
2
22
51

3

could you also comment on why lazy pattern matching is used in this example? – GreenhouseVeg Nov 22 '19 at 12:16
https://wiki.haskell.org/Lazy_pattern_match discusses the difference between `~(a, b)` and `(a, b)` using `splitAt :: Int -> [a] -> ([a], [a])` as an example. I think the same (or at least a similar) argument will apply to `split`. – chepner Nov 22 '19 at 13:56
1

Basically, if the fold function uses a strict pattern match, it would have to recurse all the way to the end of the list to reach `([], [])` to ensure that there *is* a pair to unpatch. The function with lazy pattern matching is equivalent to `(\a p -> (a:fst p, snd p))`, which doesn't care if `p` is a pair until later. – chepner Nov 22 '19 at 13:59

Willem Van Onsem · Answer 2 · 2019-11-22T12:36:29.680

We can take a look at an example. For example if we have a list [1, 4, 2, 5]. If we thus process the list, then we see that foldr will be calculated as:

foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [1,4,2,5]

So here a is first the first item of the list, and then it will tus return something like:

(1:y, x)
    where (x, y) = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [4,2,5]

Notice that here the (x, y) tuple is swapped when we prepend a to the first item of the 2-tuple.

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [2,5]

and if we keep doing that, we thus obtain:

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:y'', x'')
          (x'', y'') = (5:y''', x''')
          (x''', y''') = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) []

Since we reached the end of the list, we thus obtain for the foldr … ([], []) [], the 2-tuple ([], []):

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:y'', x'')
          (x'', y'') = (5:y''', x''')
          (x''', y''') = ([],[])

So x''' = [] and y''' = [], so thus this is resolved to:

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:y'', x'')
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so x'' = [5] and y'' = []:

(1:y, x)
    where (x, y) = (4:y', x')
          (x', y') = (2:[], [5])
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so x' = [5] and y' = [2]:

(1:y, x)
    where (x, y) = (4:[5], [2])
          (x', y') = (2:[], [5])
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so x = [4, 5] and y = [2] so eventually we obtain:

(1:[2], [4,5])
    where (x, y) = (4:[5], [2])
          (x', y') = (2:[], [5])
          (x'', y'') = (5:[], [])
          (x''', y''') = ([],[])

so the result is the expected ([1,2], [4,5]).

Thanks. One more question: Does this code work with `foldl` instead of `foldr` ? Why ? — Michael, Nov 22 '19 at 12:54
@Michael: the items will be in reversed order, and furthermore depending on the length of the function the first item of the list will be in the first or second item of the 2-tuple. — Willem Van Onsem, Nov 22 '19 at 12:57

Will Ness · Answer 3 · 2019-11-24T12:32:07.757

Approximately,

foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]
=
let g a ~(x,y) = (a:y,x) in
g a $ g b $ g c $ g d $ g e ([],[])
=
g a $ g b $ g c $ g d $ ([e],[])
=
g a $ g b $ g c $ ([d],[e])
=
g a $ g b $ ([c,e],[d])
=
g a $ ([b,d],[c,e])
=
([a,c,e],[b,d])

But truly,

foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]
=
let g a ~(x,y) = (a:y,x) in
g a $ foldr g ([],[]) [b,c,d,e]
=
(a:y,x) where 
    (x,y) = foldr g ([],[]) [b,c,d,e]
=
(a:y,x) where 
    (x,y) = (b:y2,x2) where
                 (x2,y2) = foldr g ([],[]) [c,d,e]
=
(a:y,x) where 
    (x,y) = (b:y2,x2) where
                 (x2,y2) = (c:y3,x3) where
                                (x3,y3) = (d:y4,x4) where
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])

which is forced in the top-down manner by access (if and when), being progressively fleshed-out as, e.g.,

=
(a:x2,b:y2) where 
                 (x2,y2) = (c:y3,x3) where
                                (x3,y3) = (d:y4,x4) where
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])
=
(a:c:y3,b:x3) where 
                                (x3,y3) = (d:y4,x4) where
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])
=
(a:c:x4,b:d:y4) where 
                                               (x4,y4) = (e:y5,x5) where
                                                              (x5,y5) = ([],[])
=
(a:c:e:y5,b:d:x5) where 
                                                              (x5,y5) = ([],[])
=
(a:c:e:[],b:d:[])

but it could be that the forcing will be done in a different order, depending on how it is called, e.g.

print . (!!1) . snd $ foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]
print . (!!2) . fst $ foldr (\a ~(x,y) -> (a:y,x)) ([],[]) [a,b,c,d,e]

etc.

edit: to address the questions about the lazy pattern, it is done for proper laziness of the resulting function:

foldr with the combining function which is strict in its second argument, encodes recursion, which is bottom-up. The result of recursively processing the rest of the list is constructed first, and the head portion of the result is combined with that, afterwards.
foldr with the combining function which is lazy in its second argument, encodes corecursion, which is top-down. The head portion of the resulting value is constructed first, and the rest is filled out later. It is very reminiscent of tail recursion modulo cons, in Prolog and elsewhere. Lazy evaluation as a concept came from "CONS should not evaluate its arguments"; TRMC does not evaluate the second argument to the constructor until later, which is what really matters.

score 2 · Answer 4 · answered Nov 22 '19 at 17:53

Let's translate the fold away.

splatter :: [a] -> ([a], [a])
splatter = foldr (\a ~(x,y) -> (a:y,x)) ([],[])

What's this mean? foldr for lists is defined

foldr :: (a -> r -> r) -> r -> [a] -> r
foldr k z = go
  where
    go [] = z
    go (p : ps) = p `k` go ps

Let's inline it and simplify:

splatter = go
  where
    go [] = ([], [])
    go (p : ps) =
      (\a ~(x,y) -> (a:y,x)) p (go ps)

splatter = go
  where
    go [] = ([], [])
    go (p : ps) =
      (\ ~(x,y) -> (p:y,x)) (go ps)

splatter = go
  where
    go [] = ([], [])
    go (p : ps) =
      let (x, y) = go ps
      in (p : y, x)

The lazy-by-default pattern match in the let means that we don't actually actually make the recursive call until someone forces x or y.

The key thing to notice is that x and y swap places on each recursive call. This leads to the alternating pattern.

Redu · Answer 5 · 2019-11-22T17:29:39.280

So everything happens in the \a ~(x,y) -> (a:y,x) function where in first turn a is the last item from of the provided list and (x,y) is an alternating tuple accumulator that starts with ([],[]). The current element gets prepended to y by a:y but then the x and y lists in tuple gets swapped.

However it's worth to mention that, all new appendings are returned on the first side of the tuple which guarantees the first side eventually starts with the first item of the list since it gets appended the last.

So for a list of [1,2,3,4,5,6] the steps are follows

a          (x   ,   y)      return
----------------------------------
6       ([]     , []     ) (6:y, x)
5       ([6]    , []     ) (5:y, x)
4       ([5]    , [6]    ) (4:y, x)
3       ([4,6]  , [5]    ) (3:y, x)
2       ([3,5]  , [4,6]  ) (2:y, x)
1       ([2,4,6], [3,5]  ) (1:y, x)
[]      ([1,3,5], [2,4,6]) no return

Regarding the tilde ~ operator it is best described in the Haskell/Laziness topic of Haskell guide as follows

Prepending a pattern with a tilde sign delays the evaluation of the value until the component parts are actually used. But you run the risk that the value might not match the pattern — you're telling the compiler 'Trust me, I know it'll work out'. (If it turns out it doesn't match the pattern, you get a runtime error.) To illustrate the difference:

Prelude> let f (x,y) = 1
Prelude> f undefined
*** Exception: Prelude.undefined

Prelude> let f ~(x,y) = 1
Prelude> f undefined
1

In the first example, the value is evaluated because it has to match the tuple pattern. You evaluate undefined and get undefined, which stops the proceedings. In the latter example, you don't bother evaluating the parameter until it's needed, which turns out to be never, so it doesn't matter you passed it undefined.

This doesn't explain why the code in question uses a lazy pattern, though. It's not trying to protect against a pattern-match failure. — chepner, Nov 22 '19 at 16:44
@chepner Your comment under one of the answers explains it well so people should not overlook that. Then i can not make sure if this has an effect on the performance since either way you have to process the list all the way, right..? — Redu, Nov 22 '19 at 16:52
no, we don't have to process all the list, all the way. if we call `take 2 . fst $ foldr ...`, only the first two elements of the first split will be returned, i.e. only the first *three* positions in the input list's spine will be accessed. but without the `~`, yes, the *whole* list would be accessed, regardless. — Will Ness, Nov 22 '19 at 17:38
@WillNess `take 2 . fst` is a good point but i can not make sure since the workflow is right to left. Doesn't it have to process all the list all the way in order to find what eventually ends up at the first two positions of the first list in the tuple? — Redu, Nov 22 '19 at 23:43
no. see my answer, under "But truly". e.g. `take 1 . fst` causes just one reduction: `(a:y,x)` is already known after it (with `y`, `x` still unknown) and it is all we need for `take 1` to complete. — Will Ness, Nov 23 '19 at 09:48

chepner · Answer 6 · 2019-11-22T18:19:08.447

Effectively, the fold function alternates which list the next item from the input list is added to. A similar function in a language like Python would be

def split(xs):
    a0 = a = []
    b0 = b = []
    for x in xs:
        a.append(x)
        a, b = b, a
    return a0, b0

A lazy pattern is used for two reasons:

To allow consuming the resulting lists immediately, without waiting for foldr to consume all the input
To allow splitting of infinite lists.

Consider this example:

let (odds, evens) = foldr (\a ~(x,y) -> (a:y,x)) ([],[]) $ [1..]
in take 5 odds

The result is [1,3,5,7,9].

If you dropped the lazy pattern and used

let (odds, evens) = foldr (\a (x,y) -> (a:y,x)) ([],[]) $ [1..]
in take 10 odds

the code would never terminate, because take couldn't get the first element (let alone the first five) without first computing the entire list of odd values.

Why is that? Consider the definition of Data.List.foldr:

foldr k z = go
  where
    go [] = z
    go (y:ys) = y `k` go ys

If k = \a (x,y) -> (a:y, x) is strict in both arguments, then the evaluation of y `k` go ys doesn't terminate until the base case of go is reached.

Using a lazy pattern, the function is equivalent to

\a p -> (a:snd p, fst p)

meaning we never have to match on p until fst or snd does so; the function is now lazy in its second argument. That means that

go (y:ys) = y `k` go ys
          = (\a p -> (a:snd p, fst p)) y (go ys)
          = let p = go ys in (y:snd p, fst p)

returns immediately without further evaluating go. Only once we try to get the second element of either list do we need to call go again, but once again we only have to progress one step.

Walk through a list split function in Haskell

6 Answers6

Linked