Why the `foldr`, `foldr1`, `scanr` and `scanr1` functions haven't a _problem with productivity when they are applied to big lists?

Question

I read the old Russian translate of the Learn You a Haskell for Great Good! book. I see that the current English version (online) is newer, therefore I look it time of time also.

The quote:

When you put together two lists (even if you append a singleton list to a list, for instance: [1,2,3] ++ [4]), internally, Haskell has to walk through the whole list on the left side of ++. That's not a problem when dealing with lists that aren't too big. But putting something at the end of a list that's fifty million entries long is going to take a while. However, putting something at the beginning of a list using the : operator (also called the cons operator) is instantaneous.

I assumed that Haskell has to walk through the whole list to get the last item of the list for the foldr, foldr1, scanr and scanr1 functions. Also I assumed that Haskell will do the same for getting a previous element (and so on for each item).

But I see I was mistaken:

UPD

I try this code and I see the similar time of processing for both cases:

data' = [1 .. 10000000]
sum'r = foldr1 (\x acc -> x + acc ) data' 
sum'l = foldl1 (\acc x -> x + acc ) data'

Is each list of Haskell bidirectional? I assume that for getting last item of list Haskell at first are to iterate each item and to remember the necessary item (last item for example) for getting (later) the previous item of bidirectional list (for lazy computation). Am I right?

Foldr etc. have *nothing* to do with appending to the end of the list, so no they don't have that problem. — Bakuriu, Oct 11 '16 at 07:29

score 7 · Accepted Answer · edited May 23 '17 at 12:19

It's tricky since Haskell is lazy.

Evaluating head ([1..1000000]++[1..1000000]) will return immediately, with 1. The lists will never be fully created in memory: only the first element of the first list will be.

If you instead demand the full list [1..1000000]++[1..1000000] then ++ will indeed have to create a two-million long list.

foldr may or may not evaluate the full list. It depends on whether the function we use is lazy. For example, here's map f xs written using foldr:

foldr (\y ys -> f y : ys) [] xs

This is efficient as map f xs is: lists cells are produced on demand, in a streaming fashion. If we need only the first ten elements of the resulting list, then we indeed create only the first ten cells -- foldr will not be applied to the rest of the list. If we need the full resulting list, then foldr will be run over the full list.

Also note that xs++ys can be defined similarly in terms of foldr:

foldr (:) ys xs

and has similar performance properties.

By comparison, foldl instead always runs over the whole list.

In the example you mention we have longList ++ [something], appending to the end of the list. This only costs constant time if all we demand is the first element of the resulting list. But if we really need the last element we added, then appending will need to run over the whole list. This is why appending at the end is considered O(n) instead of O(1).

In the last update, the question speaks about computing the sum with foldr vs foldl, using the (+) operator. In such case, since (+) is strict (it needs both arguments to compute result) then both folds witll need to scan the whole list. The performance in such cases can be comparable. Indeed, they would compute, respectively

1 + (2 + (3 + (4 + .....       -- foldr
(...(((1 + 2) + 3) +4) + ....  -- foldl

By comparison foldl' would be more memory efficient, since it starts reducing the above sum before building the above giant expression. That is, it would compute 1+2 first (3), then 3+3 (6), then 6 + 4 (10),... keeping in memory only the last result (a single integer) while the list is being scanned.

To the OP: the topic of laziness is not easy to grasp the first time. It is quite vast -- you just met a ton of different examples which have subtle but significant performance differences. It's hard to explain everything succinctly -- it's just too broad. I'd recommend to focus on small examples and start digesting those first.

Thank you. But I don't understand still. I added the **UPD** section into my topic. Look it please. — Andrey Bushman, Oct 11 '16 at 08:01

Why the `foldr`, `foldr1`, `scanr` and `scanr1` functions haven't a _problem with productivity when they are applied to big lists?

1 Answers1