Understanding non-strictness in Haskell with a recursive example

Question

What is the difference between this two, in terms of evaluation?

Why this "obeys" (how to say?) non-strictness

recFilter :: (a -> Bool) -> [a] -> [a]
recFilter _ [] = []
recFilter p (h:tl) = if (p h) 
  then h : recFilter p tl
  else recFilter p tl

while this doesn't?

recFilter :: (a -> Bool) -> [a] -> Int -> [a]
recFilter _ xs 0 = xs
recFilter p (h:tl) len
  | p(h)  = recFilter p (tl ++ [h]) (len-1)
  | otherwise = recFilter p tl (len-1)

Is it possible to write a tail-recursive function non-strictly?

To be honest I also don't understand the call stack of the first example, because I can't see where that h: goes. Is there a way to see this in ghci?

Uhm, doesn't the second snippet lack a base case? The function always recurses, even when `len` becomes zero or negative. — chi, Oct 16 '21 at 09:03
It returns `h : recFilter p tl`, so the `recFilter p tl` part will *not* be evaluated unless it is needed by the result. This on the other hand with your second recursive function that will keep making recursive calls, and does not even eventually returns an object. If it had returned the accumulator in case the end of the list is reached, then it would thus have to evaluate until that point. — Willem Van Onsem, Oct 16 '21 at 09:03
@chi: it is actually even worse, it manipulates the accumulator by adding an item to that list, and thus even if there is a base case, if there is one item for which the predicate matches, it gets stuck in an infinite loop. — Willem Van Onsem, Oct 16 '21 at 09:06
@chi you're right, I added a correction. It was present in the original code listing. — LowFieldTheory, Oct 16 '21 at 09:06
@WillemVanOnsem I noticed that, but I guess we want to recurse on `len`, not on the list. Indeed the OP just did that. — chi, Oct 16 '21 at 09:09
A terminology note: the first `recFilter` is not non-strict. (Eliminating the double-negative: the first `recFilter` is strict.) However, it does satisfy the definition of being *productive*, and also the definition of being *guarded recursion* (one way to guarantee productivity). — Daniel Wagner, Oct 16 '21 at 15:50

score 4 · Answer 1 · answered Oct 16 '21 at 09:25

The non-tail recursive function roughly consumes a portion of the input (the first element) to produce a portion of the output (well, if it's not filtered out at least). Then recursion handles the next portion of the input, and so on.

Your tail recursive function will recurse until len becomes zero, and only at that point it will output the whole result.

Consider this pseudocode:

def rec1(p,xs):
   case xs:
      []     -> []
      (y:ys) -> if p(y): print y
                rec1(p,ys)

and compare it with this accumulator-based variant. I'm not using len since I use a separate accumulator argument, which I assume to be initially empty.

def rec2(p,xs,acc):
   case xs:
      []     -> print acc
      (y:ys) -> if p(y): 
                   rec2(p,ys,acc++[y])
                else:
                   rec2(p,ys,acc)

rec1 prints before recursing: it does not need to inspect the whole input list to start printing its output. It works in a "steraming" fashion, in a sense. Instead, rec2 will only start to print at the very end, after the input list was completely scanned.

In your Haskell code there are no prints, of course, but you can thing of returning x : function call as "printing x", since x is made available to the caller of our function before function call is actually made. (Well, to be pedantic this depends on how the caller will consume the output list, but I'll neglect this.)

Hence the non-tail recursive code can also work on infinite lists. Even on finite inputs, performance is improved: if we call head (rec1 p xs), we only evaluate xs until the first non-discarded element. By contrast head (rec2 p xs) would fully filter the whole list xs, even we don't need that.

Willem Van Onsem · Accepted Answer · 2021-10-16T09:36:51.713

The second implementation does not make much sense: a variable named len will not contain the length of the list. You thus need to pass this, for infinite lists, this would not work, since there is no length at all.

You likely want to produce something like:

recFilter :: (a -> Bool) -> [a] -> [a]
recFilter p = go []
    where go ys [] = ys  -- (1)
          go ys (x:xs) | p x = go (ys ++ [x]) xs
                       | otherwise = go ys xs

where we thus have an accumulator to which we append the items in the list, and then eventually return the accumulator.

The problem with the second approach is that as long as the accumulator is not returned, Haskell will need to keep recursing until at least we reach weak head normal form (WHNF). This means that if we pattern match the result with [] or (_:_), we will need at least have to recurse until case one, since the other cases only produce a new expression, and it will thus not yield a data constructor on which we can pattern match.

This in contrast to the first filter where if we pattern match on [] or (_:_) it is sufficient to stop at the first case (1), or the third case 93) where the expression produces an object with a list data constructor. Only if we require extra elements to pattern match, for example (_:_:_), it will require to evaluate the recFilter p tl in case (2) of the first implementation:

recFilter :: (a -> Bool) -> [a] -> [a]
recFilter _ [] = []  -- (1)
recFilter p (h:tl) = if (p h) 
  then h : recFilter p tl  -- (2)
  else recFilter p tl

For more information, see the Laziness section of the Wikibook on Haskell that describes how laziness works with thunks.

Understanding non-strictness in Haskell with a recursive example

2 Answers2