19

I'm calculating the sum of a list after applying someFunction to every element of it like so:

sum (map someFunction myList)

someFunction is very resource heavy so to optimise it I want to stop calculating the sum if it goes above a certain threshold.

It seems like I need to use fold but I don't know how to break out if it if the accumulator reaches the threshold. My guess is to somehow compose fold and takeWhile but I'm not exactly sure how.

Balázs Sáros
  • 495
  • 6
  • 12

9 Answers9

19

Another technique is to use a foldM with Either to capture the early termination effect. Left signals early termination.

import Control.Monad(foldM)

sumSome :: (Num n,Ord n) => n -> [n] -> Either n n
sumSome thresh = foldM f 0
  where
    f a n 
      | a >= thresh = Left a
      | otherwise   = Right (a+n)

To ignore the exit status, just compose with either id id.

sumSome' :: (Num n,Ord n) => n -> [n] -> n
sumSome' n = either id id . sumSome n
trevor cook
  • 1,531
  • 8
  • 22
18

One of the options would be using scanl function, which returns a list of intermediate calculations of foldl.

Thus, scanl1 (+) (map someFunction myList) will return the intermediate sums of your calculations. And since Haskell is a lazy language it won't calculate all the values of myList until you need it. For example:

take 5 $ scanl1 (+) (map someFunction myList)

will calculate someFunction 5 times and return the list of these 5 results.

After that you can use either takeWhile or dropWhile and stop the calculation, when a certain condition is True. For example:

head $ dropWhile (< 1000) $ scanl1 (+) [1..1000000000]

will stop the calculation, when sum of the numbers reaches 1000 and returns 1035.

Igor Drozdov
  • 14,690
  • 5
  • 37
  • 53
3

Use a bounded addition operator instead of (+) with foldl.

foldl (\b a -> b + if b > someThreshold then 0 else a) 0 (map someFunction myList)

Because Haskell is non-strict, only calls to someFunction that are necessary to evaluate the if-then-else are themselves evaluated. fold still traverses the entire list.

> foldl (\b a -> b + if b > 10 then 0 else a) 0 (map (trace "foo") [1..20])
foo
foo
foo
foo
foo
15

sum [1..5] > 10, and you can see that trace "foo" only executes 5 times, not 20.

Instead of foldl, though, you should use the strict version foldl' from Data.Foldable.

chepner
  • 497,756
  • 71
  • 530
  • 681
  • 2
    This is cool, and seems like witchcraft, but applied to an infinite list it does hang. I think the trace shows that `a` just isn't forced, although the whole list is traversed. – trevor cook Aug 07 '18 at 16:15
  • Ah, right. I couldn't quite put my finger on why this seemed to work anyway. – chepner Aug 07 '18 at 17:34
  • I have been thinking that this lazyness trick is a canonical way to "break out" from folds. – arrowd Aug 08 '18 at 09:22
  • @arrowd this does _not_ break out from the fold, it keeps adding 0s instead. it doesn't break out from the `map`'s "loop" either, it keeps looping on the input, just stops calling the mapped function. (just commenting for the sake of future visitors here) – Will Ness Sep 01 '21 at 12:43
  • @WillNess if you're saying that `map f [1,2,3]` generates `f x` thunks for each element in the list, you're probably wrong. But to be honest I'm not 100% sure too. – arrowd Sep 01 '21 at 14:32
  • 1
    IIUC, `map f [1,2,3]` generates a single thunk. When you pattern match, e.g. `(t:ts) = map f [1,2,3]`, you'll get two thunks, one for `t = f 1` and one for `ts = map f [2, 3]`. – chepner Sep 01 '21 at 14:38
  • @arrowd we're not talking in general about `map`s here. I'm referring to the specific `map` call in the answer as driven by the `foldl` call. this `foldl` will demand every element `a` in its input (as the answer indeed is saying). possibly ignoring its value. – Will Ness Sep 01 '21 at 14:43
  • @WillNess With this I agree. I also wrote `fold still traverses the entire list` in my answer. – arrowd Sep 01 '21 at 16:11
  • @arrowd I wasn't wrong in what I wrote in that comment though, when I said that "it keeps looping on the input". I wasn't talking about anything else. – Will Ness Sep 01 '21 at 18:10
3

This will do what you ask about without building the intermediate list as scanl' would (and scanl would even cause a thunks build-up on top of that):

foldl'Breaking break reduced reducer acc list = 
    foldr cons (\acc -> acc) list acc 
          where 
          cons x r acc | break acc x = reduced acc x 
                       | otherwise   = r $! reducer acc x

cf. related wiki page.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
2

You could try making your own sum function, maybe call it boundedSum that takes

  1. an Integer upper bound

  2. an [Integer] to sum over

  3. a "sum up until this point" value to be compared with the upper bound

and returns the sum of the list.

    boundedSum :: Integer -> [Integer] -> Integer -> Integer
    boundedSum upperBound (x : xs) prevSum =
        let currentSum = prevSum + x
            in
        if   currentSum > upperBound
        then upperBound
        else boundedSum upperBound xs currentSum
    boundedSum upperBound [] prevSum =
        prevSum

I think this way you won't "eat up" more of the list if the sum up until the current element exceeds upperBound.

EDIT: The answers to this question suggest better techniques than mine and the question itself looks rather similar to yours.

2

This is a possible solution:

last . takeWhile (<=100) . scanl (+) 0 . map (^2) $ [1..]

Dissected:

  • take your starting list ([1..] in the example)
  • map your expensive function ((^2))
  • compute partial sums scanl (+) 0
  • stop after the partial sums become too large (keep those (<=100))
  • take the last one

If performance matters, also try scanl', which might improve it.

chi
  • 111,837
  • 3
  • 133
  • 218
1

Something like this using until :: (a -> Bool) -> (a -> a) -> a -> a from the Prelude

sumUntil :: Real a => a -> [a] -> a
sumUntil threshold u = result

    where

    (_, result) = until stopCondition next (u, 0)

    next :: Real a => ([a], a) -> ([a], a)
    next ((x:xs), y) = (xs, x + y)

    stopCondition :: Real a => ([a], a) -> Bool
    stopCondition (ls, x) = null ls || x > threshold

Then apply

sumUntil 10 (map someFunction myList)
Elmex80s
  • 3,428
  • 1
  • 15
  • 23
0

This post is already a bit older but I'd like to mention a way to generalize the nice code of @trevor-cook above to break fold with the additional possibility to return not only a default value or the accumulator but also the index and element of the list where the breaking condition was satisfied:

import Control.Monad (foldM)

breakFold step initialValue list exitCondition exitFunction = 
  either id (exitFunction (length  list) (last list)) 
      (foldM f initialValue (zip [0..] list))
    where f acc (index,x)
            | exitCondition index x acc 
                        = Left (exitFunction index x acc)
            | otherwise = Right (step index x acc)

It also only requires to import foldM. Examples for the usage are:

mysum thresh list = breakFold (\i x acc -> x + acc) 0 list 
                         (\i x acc -> x + acc > thresh) 
                         (\i x acc -> acc)
myprod thresh list = breakFold (\i x acc -> x * acc) 1 list 
                          (\i x acc -> acc == thresh) 
                          (\i x acc -> (i,x,acc))

returning

*myFile> mysum 42 [1,1..]
42
*myFile> myprod 0 ([1..5]++[0,0..])
(6,0,0)
*myFile> myprod 0 (map (\n->1/n) [1..])
(178,5.58659217877095e-3,0.0)

In this way, one can use the index and the last evaluated list value as input for further functions.

Will Ness
  • 70,110
  • 9
  • 98
  • 181
exchange
  • 363
  • 4
  • 9
0

Despite the age of this post, I'll add a possible solution. I like continuations because I find them very useful in terms of flow control.

breakableFoldl
  :: (b -> a -> (b -> r) -> (b -> r) -> r)
  -> b
  -> [a]
  -> (b -> r)
  -> r
breakableFoldl f b (x : xs) = \ exit ->
    f b x exit $ \ acc ->
        breakableFoldl f acc xs exit
breakableFoldl _ b _ = ($ b)

breakableFoldr
  :: (a -> b -> (b -> r) -> (b -> r) -> r)
  -> b
  -> [a]
  -> (b -> r)
  -> r
breakableFoldr f b l = \ exit ->
  fix (\ fold acc xs next ->
    case xs of
      x : xs' -> fold acc xs' (\ acc' -> f x acc' exit next)
      _ -> next acc) b l exit

exampleL = breakableFoldl (\ acc x exit next ->
    ( if acc > 15
      then exit
      else next . (x +)
    ) acc
  ) 0 [1..9] print

exampleR = breakableFoldr (\ x acc exit next ->
    ( if acc > 15
      then exit
      else next . (x +)
    ) acc
  ) 0 [1..9] print
Sledge
  • 178
  • 13
  • your left fold folds from the right, and your right fold from the left. consider `breakableFoldl (\acc x exit next -> if x < 5 then exit acc else next ((x:).acc)) id [1..10] ($[100]) ==> [5,6,7,8,9,10,11]` and `breakableFoldr (\x acc exit next -> if x > 5 then exit acc else next (acc.(x:))) id [1..10] ($[100]) ==> [1,2,3,4,5,11]`. – Will Ness Sep 30 '21 at 07:36
  • 1
    That's ok. I'm confident that there are many like me who can't tell left from right. Admittedly most of them are probably under the age of 5, but still... – Sledge Sep 30 '21 at 14:35