How to break out from a fold function in haskell when the accumulator met a certain condition?

Question

I'm calculating the sum of a list after applying someFunction to every element of it like so:

sum (map someFunction myList)

someFunction is very resource heavy so to optimise it I want to stop calculating the sum if it goes above a certain threshold.

It seems like I need to use fold but I don't know how to break out if it if the accumulator reaches the threshold. My guess is to somehow compose fold and takeWhile but I'm not exactly sure how.

If the sum goes above a threshold or if the outcome of your resource intensive function goes above a threshold? — Elmex80s, Aug 08 '18 at 07:59

trevor cook · Answer 1 · 2021-10-28T18:32:15.063

Another technique is to use a foldM with Either to capture the early termination effect. Left signals early termination.

import Control.Monad(foldM)

sumSome :: (Num n,Ord n) => n -> [n] -> Either n n
sumSome thresh = foldM f 0
  where
    f a n 
      | a >= thresh = Left a
      | otherwise   = Right (a+n)

To ignore the exit status, just compose with either id id.

sumSome' :: (Num n,Ord n) => n -> [n] -> n
sumSome' n = either id id . sumSome n

score 18 · Accepted Answer · answered Aug 07 '18 at 13:08

One of the options would be using scanl function, which returns a list of intermediate calculations of foldl.

Thus, scanl1 (+) (map someFunction myList) will return the intermediate sums of your calculations. And since Haskell is a lazy language it won't calculate all the values of myList until you need it. For example:

take 5 $ scanl1 (+) (map someFunction myList)

will calculate someFunction 5 times and return the list of these 5 results.

After that you can use either takeWhile or dropWhile and stop the calculation, when a certain condition is True. For example:

head $ dropWhile (< 1000) $ scanl1 (+) [1..1000000000]

will stop the calculation, when sum of the numbers reaches 1000 and returns 1035.

Perhaps a case to take into consideration is that it is possible that the threshold is never reached, in which case `head` will error. — Willem Van Onsem, Aug 07 '18 at 14:05
maybe i'd use `find` instead of `head` and `dropWhile`: `find (>= 1000) (scanl1 (+) [1..])` — Justin L., Dec 08 '20 at 04:21

chepner · Answer 3 · 2018-08-07T17:37:18.440

3

Use a bounded addition operator instead of (+) with foldl.

foldl (\b a -> b + if b > someThreshold then 0 else a) 0 (map someFunction myList)

Because Haskell is non-strict, only calls to someFunction that are necessary to evaluate the if-then-else are themselves evaluated. fold still traverses the entire list.

> foldl (\b a -> b + if b > 10 then 0 else a) 0 (map (trace "foo") [1..20])
foo
foo
foo
foo
foo
15

sum [1..5] > 10, and you can see that trace "foo" only executes 5 times, not 20.

Instead of foldl, though, you should use the strict version foldl' from Data.Foldable.

edited Aug 07 '18 at 17:37

answered Aug 07 '18 at 13:19

chepner

497,756
71
530
681

2

This is cool, and seems like witchcraft, but applied to an infinite list it does hang. I think the trace shows that `a` just isn't forced, although the whole list is traversed. – trevor cook Aug 07 '18 at 16:15
Ah, right. I couldn't quite put my finger on why this seemed to work anyway. – chepner Aug 07 '18 at 17:34
I have been thinking that this lazyness trick is a canonical way to "break out" from folds. – arrowd Aug 08 '18 at 09:22
@arrowd this does _not_ break out from the fold, it keeps adding 0s instead. it doesn't break out from the `map`'s "loop" either, it keeps looping on the input, just stops calling the mapped function. (just commenting for the sake of future visitors here) – Will Ness Sep 01 '21 at 12:43
@WillNess if you're saying that `map f [1,2,3]` generates `f x` thunks for each element in the list, you're probably wrong. But to be honest I'm not 100% sure too. – arrowd Sep 01 '21 at 14:32
1

IIUC, `map f [1,2,3]` generates a single thunk. When you pattern match, e.g. `(t:ts) = map f [1,2,3]`, you'll get two thunks, one for `t = f 1` and one for `ts = map f [2, 3]`. – chepner Sep 01 '21 at 14:38
@arrowd we're not talking in general about `map`s here. I'm referring to the specific `map` call in the answer as driven by the `foldl` call. this `foldl` will demand every element `a` in its input (as the answer indeed is saying). possibly ignoring its value. – Will Ness Sep 01 '21 at 14:43
@WillNess With this I agree. I also wrote `fold still traverses the entire list` in my answer. – arrowd Sep 01 '21 at 16:11
@arrowd I wasn't wrong in what I wrote in that comment though, when I said that "it keeps looping on the input". I wasn't talking about anything else. – Will Ness Sep 01 '21 at 18:10

score 3 · Answer 4 · answered Aug 07 '18 at 14:05

This will do what you ask about without building the intermediate list as scanl' would (and scanl would even cause a thunks build-up on top of that):

foldl'Breaking break reduced reducer acc list = 
    foldr cons (\acc -> acc) list acc 
          where 
          cons x r acc | break acc x = reduced acc x 
                       | otherwise   = r $! reducer acc x

cf. related wiki page.

Kartik Sabharwal · Answer 5 · 2018-08-07T13:08:26.047

You could try making your own sum function, maybe call it boundedSum that takes

an Integer upper bound
an [Integer] to sum over
a "sum up until this point" value to be compared with the upper bound

and returns the sum of the list.

    boundedSum :: Integer -> [Integer] -> Integer -> Integer
    boundedSum upperBound (x : xs) prevSum =
        let currentSum = prevSum + x
            in
        if   currentSum > upperBound
        then upperBound
        else boundedSum upperBound xs currentSum
    boundedSum upperBound [] prevSum =
        prevSum

I think this way you won't "eat up" more of the list if the sum up until the current element exceeds upperBound.

EDIT: The answers to this question suggest better techniques than mine and the question itself looks rather similar to yours.

score 2 · Answer 6 · answered Aug 07 '18 at 13:10

This is a possible solution:

last . takeWhile (<=100) . scanl (+) 0 . map (^2) $ [1..]

Dissected:

take your starting list ([1..] in the example)
map your expensive function ((^2))
compute partial sums scanl (+) 0
stop after the partial sums become too large (keep those (<=100))
take the last one

If performance matters, also try scanl', which might improve it.

Elmex80s · Answer 7 · 2018-08-18T11:46:27.670

Something like this using until :: (a -> Bool) -> (a -> a) -> a -> a from the Prelude

sumUntil :: Real a => a -> [a] -> a
sumUntil threshold u = result

    where

    (_, result) = until stopCondition next (u, 0)

    next :: Real a => ([a], a) -> ([a], a)
    next ((x:xs), y) = (xs, x + y)

    stopCondition :: Real a => ([a], a) -> Bool
    stopCondition (ls, x) = null ls || x > threshold

Then apply

sumUntil 10 (map someFunction myList)

score 0 · Answer 8 · edited Sep 01 '21 at 13:11

This post is already a bit older but I'd like to mention a way to generalize the nice code of @trevor-cook above to break fold with the additional possibility to return not only a default value or the accumulator but also the index and element of the list where the breaking condition was satisfied:

import Control.Monad (foldM)

breakFold step initialValue list exitCondition exitFunction = 
  either id (exitFunction (length  list) (last list)) 
      (foldM f initialValue (zip [0..] list))
    where f acc (index,x)
            | exitCondition index x acc 
                        = Left (exitFunction index x acc)
            | otherwise = Right (step index x acc)

It also only requires to import foldM. Examples for the usage are:

mysum thresh list = breakFold (\i x acc -> x + acc) 0 list 
                         (\i x acc -> x + acc > thresh) 
                         (\i x acc -> acc)
myprod thresh list = breakFold (\i x acc -> x * acc) 1 list 
                          (\i x acc -> acc == thresh) 
                          (\i x acc -> (i,x,acc))

returning

*myFile> mysum 42 [1,1..]
42
*myFile> myprod 0 ([1..5]++[0,0..])
(6,0,0)
*myFile> myprod 0 (map (\n->1/n) [1..])
(178,5.58659217877095e-3,0.0)

In this way, one can use the index and the last evaluated list value as input for further functions.

recalculating the already known values with `length` and `last` seems to indicate an abstraction leakage. — Will Ness, Sep 01 '21 at 13:13
@Will Ness thanks for reading the code. How would you do it to avoid that? — exchange, Sep 01 '21 at 16:33
You just need to massage it a little bit to pass on the index and the element. — Will Ness, Sep 01 '21 at 18:12
When I see a top level function with no type signature, my brain turns off. — dfeuer, Oct 28 '21 at 18:55

Sledge · Answer 9 · 2021-09-30T18:40:39.077

0

Despite the age of this post, I'll add a possible solution. I like continuations because I find them very useful in terms of flow control.

breakableFoldl
  :: (b -> a -> (b -> r) -> (b -> r) -> r)
  -> b
  -> [a]
  -> (b -> r)
  -> r
breakableFoldl f b (x : xs) = \ exit ->
    f b x exit $ \ acc ->
        breakableFoldl f acc xs exit
breakableFoldl _ b _ = ($ b)

breakableFoldr
  :: (a -> b -> (b -> r) -> (b -> r) -> r)
  -> b
  -> [a]
  -> (b -> r)
  -> r
breakableFoldr f b l = \ exit ->
  fix (\ fold acc xs next ->
    case xs of
      x : xs' -> fold acc xs' (\ acc' -> f x acc' exit next)
      _ -> next acc) b l exit

exampleL = breakableFoldl (\ acc x exit next ->
    ( if acc > 15
      then exit
      else next . (x +)
    ) acc
  ) 0 [1..9] print

exampleR = breakableFoldr (\ x acc exit next ->
    ( if acc > 15
      then exit
      else next . (x +)
    ) acc
  ) 0 [1..9] print

edited Sep 30 '21 at 18:40

answered Sep 29 '21 at 00:24

Sledge

178
13

your left fold folds from the right, and your right fold from the left. consider `breakableFoldl (\acc x exit next -> if x < 5 then exit acc else next ((x:).acc)) id [1..10] ($[100]) ==> [5,6,7,8,9,10,11]` and `breakableFoldr (\x acc exit next -> if x > 5 then exit acc else next (acc.(x:))) id [1..10] ($[100]) ==> [1,2,3,4,5,11]`. – Will Ness Sep 30 '21 at 07:36
1

That's ok. I'm confident that there are many like me who can't tell left from right. Admittedly most of them are probably under the age of 5, but still... – Sledge Sep 30 '21 at 14:35

How to break out from a fold function in haskell when the accumulator met a certain condition?

9 Answers9