20

UPDATE: Okay this question becomes potentially very straightforward.

q <- mapM return [1..]

Why does this never return?


Does mapM not lazily deal with infinite lists?

The code below hangs. However, if I replace line A by line B, it doesn't hang anymore. Alternatively, if I preceed line A by a "splitRandom $", it also doesn't hang.

Q1 is: Is mapM not lazy? Otherwise, why does replacing line A with line B "fix this" code?

Q2 is: Why does preceeding line A with splitRandom "solve" the problem?

import Control.Monad.Random
import Control.Applicative

f :: (RandomGen g) => Rand g (Double, [Double])
f = do
    b <- splitRandom $ sequence $ repeat $ getRandom
    c <- mapM return b -- A
    -- let c = map id b -- B
    a <- getRandom
    return (a, c)

splitRandom :: (RandomGen g) => Rand g a -> Rand g a
splitRandom code = evalRand code <$> getSplit

t0 = do
    (a, b) <- evalRand f <$> newStdGen
    print  a
    print (take 3 b)

The code generates an infinite list of random numbers lazily. Then it generates a single random number. By using splitRandom, I can evaluate this latter random number first before the infinite list. This can be demonstrated if I return b instead of c in the function.

However, if I apply the mapM to the list, the program now hangs. To prevent this hanging, I have to apply splitRandom again before the mapM. I was under the impression that mapM can lazily

Don Stewart
  • 137,316
  • 36
  • 365
  • 468
qrest
  • 3,083
  • 3
  • 25
  • 26

5 Answers5

32

Well, there's lazy, and then there's lazy. mapM is indeed lazy in that it doesn't do more work than it has to. However, look at the type signature:

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]

Think about what this means: You give it a function a -> m b and a bunch of as. A regular map can turn those into a bunch of m bs, but not an m [b]. The only way to combine the bs into a single [b] without the monad getting in the way is to use >>= to sequence the m bs together to construct the list.

In fact, mapM is precisely equivalent to sequence . map.

In general, for any monadic expression, if the value is used at all, the entire chain of >>=s leading to the expression must be forced, so applying sequence to an infinite list can't ever finish.

If you want to work with an unbounded monadic sequence, you'll either need explicit flow control--e.g., a loop termination condition baked into the chain of binds somehow, which simple recursive functions like mapM and sequence don't provide--or a step-by-step sequence, something like this:

data Stream m a = Nil | Stream a (m (Stream m a))

...so that you only force as many monad layers as necessary.

Edit:: Regarding splitRandom, what's going on there is that you're passing it a Rand computation, evaluating that with the seed splitRandom gets, then returning the result. Without the splitRandom, the seed used by the single getRandom has to come from the final result of sequencing the infinite list, hence it hangs. With the extra splitRandom, the seed used only needs to thread though the two splitRandom calls, so it works. The final list of random numbers works because you've left the Rand monad at that point and nothing depends on its final state.

C. A. McCann
  • 76,893
  • 19
  • 209
  • 302
  • But this doesn't explain why preceeding the line with splitRandom solves the problem. Also, note that splitRandom actually splits the random seed into two and uses one of them to generate the infinite list, so there shouldn't be a dependency there. – qrest Jul 17 '10 at 04:53
  • @qrest: Added an extra note--does that help? – C. A. McCann Jul 17 '10 at 06:25
  • @camccann: No it's doesn't. Here's my reasoning. I need to consume one seed to generate the infinite list. Hence, I split the seed and used it in "splitRandom $ sequence $ repeat $ getRandom". NOTE: This is a monadic expression and it did not evaluate fully! Note also I purposefully got a random number for "a" AFTER the infinite list, proving that the seeds are not dependent. And the Rand monad does not use the seed unless a random number is generated so no seed dependency should exist. So is "return" the one actually forcing the evaluation? – qrest Jul 17 '10 at 16:09
  • @qrest: `Rand` is just a specialized `State` monad, so even if no random number is generated it has to thread the RNG seed through every function chained together with `(>>=)`. Whether or not you actually *use* the state doesn't matter. The *value* of the seed can be evaluated lazily, but the monadic expressions themselves are strictly sequenced, which is why `mapM` has to traverse the entire list. `splitRandom` changes the behavior because it runs a separate monadic computation instead of sequencing it in directly. – C. A. McCann Jul 17 '10 at 18:16
  • I think you understand the situation, but I don't yet. Why is it if I remove `c` altogether and just return `(a, b)`, it is able to function properly? – qrest Jul 17 '10 at 19:42
  • @qrest: Anything on the right of a `<-` in a `do` block gets sequenced in as part of the monadic expression. So `b` works fine because the only thing sequenced in is `splitRandom`, while `c` doesn't because it tries to sequence in a traversal of an infinite list. I'm not sure how else to explain it--the `do` notation is nice, but it's *really* obscuring the control flow in this case. If you're still puzzled, I'll write up an explicit demonstration later this evening showing the underlying logic behind the monad and the difference between `b` and `c`. – C. A. McCann Jul 17 '10 at 20:14
  • 5
    "In general, for any monadic expression, if the value is used at all, the entire chain of >>=s leading to the expression must be forced, so applying sequence to an infinite list can't ever finish." I don't think this is true in general for all monads. I just showed an example in an answer below with the identity monad, where all the values put into the monad are still there, and can be used lazily. – newacct Jan 06 '13 at 01:21
  • 1
    @newacct I agree, that part of the answer is wrong. Some monads are lazy enough that applying `sequence` to an infinite list does finish. – John L Jan 06 '13 at 02:44
  • @JohnL: Yeah, you're correct. I've been meaning to go back and edit some of my older answers because of mistakes like that. :[ – C. A. McCann Jan 06 '13 at 02:54
8

Okay this question becomes potentially very straightforward.

q <- mapM return [1..]

Why does this never return?

It's not necessarily true. It depends on the monad you're in.

For example, with the identity monad, you can use the result lazily and it terminates fine:

newtype Identity a = Identity a

instance Monad Identity where
  Identity x >>= k = k x
  return = Identity

-- "foo" is the infinite list of all the positive integers
foo :: [Integer]
Identity foo = do
  q <- mapM return [1..]
  return q

main :: IO ()
main = print $ take 20 foo -- [1 .. 20]
newacct
  • 119,665
  • 29
  • 163
  • 224
7

Here's an attempt at a proof that mapM return [1..] doesn't terminate. Let's assume for the moment that we're in the Identity monad (the argument will apply to any other monad just as well):

mapM return [1..] -- initial expression
sequence (map return [1 ..]) -- unfold mapM
let k m m' = m >>= \x ->
             m' >>= \xs ->
             return (x : xs)
in foldr k (return []) (map return [1..]) -- unfold sequence

So far so good...

-- unfold foldr
let k m m' = m >>= \x ->
             m' >>= \xs ->
             return (x : xs)
    go [] = return []
    go (y:ys) = k y (go ys)
in go (map return [1..])

-- unfold map so we have enough of a list to pattern-match go:
go (return 1 : map return [2..])
-- unfold go:
k (return 1) (go (map return [2..])
-- unfold k:
(return 1) >>= \x -> go (map return [2..]) >>= \xs -> return (x:xs)

Recall that return a = Identity a in the Identity monad, and (Identity a) >>= f = f a in the Identity monad. Continuing:

-- unfold >>= :
(\x -> go (map return [2..]) >>= \xs -> return (x:xs)) 1
-- apply 1 to \x -> ... :
go (map return [2..]) >>= \xs -> return (1:xs)
-- unfold >>= :
(\xs -> return (1:xs)) (go (map return [2..]))

Note that at this point we'd love to apply to \xs, but we can't yet! We have to instead continue unfolding until we have a value to apply:

-- unfold map for go:
(\xs -> return (1:xs)) (go (return 2 : map return [3..]))
-- unfold go:
(\xs -> return (1:xs)) (k (return 2) (go (map return [3..])))
-- unfold k:
(\xs -> return (1:xs)) ((return 2) >>= \x2 ->
                         (go (map return [3..])) >>= \xs2 ->
                         return (x2:xs2))
-- unfold >>= :
(\xs -> return (1:xs)) ((\x2 -> (go (map return [3...])) >>= \xs2 ->
                        return (x2:xs2)) 2)

At this point, we still can't apply to \xs, but we can apply to \x2. Continuing:

-- apply 2 to \x2 :
(\xs -> return (1:xs)) ((go (map return [3...])) >>= \xs2 ->
                         return (2:xs2))
-- unfold >>= :
(\xs -> return (1:xs)) (\xs2 -> return (2:xs2)) (go (map return [3..]))

Now we've gotten to a point where neither \xs nor \xs2 can be reduced yet! Our only choice is:

-- unfold map for go, and so on...
(\xs -> return (1:xs))
  (\xs2 -> return (2:xs2))
    (go ((return 3) : (map return [4..])))

So you can see that, because of foldr, we're building up a series of functions to apply, starting from the end of the list and working our way back up. Because at each step the input list is infinite, this unfolding will never terminate and we will never get an answer.

This makes sense if you look at this example (borrowed from another StackOverflow thread, I can't find which one at the moment). In the following list of monads:

mebs = [Just 3, Just 4, Nothing]

we would expect sequence to catch the Nothing and return a failure for the whole thing:

sequence mebs = Nothing

However, for this list:

mebs2 = [Just 3, Just 4]

we would expect sequence to give us:

sequence mebs = Just [3, 4]

In other words, sequence has to see the whole list of monadic computations, string them together, and run them all in order to come up with the right answer. There's no way sequence can give an answer without seeing the whole list.

Note: The previous version of this answer asserted that foldr computes starting from the back of the list, and wouldn't work at all on infinite lists, but that's incorrect! If the operator you pass to foldr is lazy on its second argument and produces output with a lazy data constructor like a list, foldr will happily work with an infinite list. See foldr (\x xs -> (replicate x x) ++ xs) [] [1...] for an example. But that's not the case with our operator k.

Owen S.
  • 7,665
  • 1
  • 28
  • 44
  • 1
    Saying that `foldr` works "from the back of the list" is a bit confusing. Both `foldl` and `foldr` process the list left-to-right. The difference is the *bracketing*, not the processing direction. – nh2 Jan 03 '13 at 02:02
  • 1
    I would contend that your description is the one that's more confusing. The Prelude documentation itself says that `foldr` "reduces the list using the binary operator, from right to left" -- the bracketing is the whole point! The initial left-right pass through the list (required anyway to traverse Haskell's list structure) establishes the stack of recursion, but does no computation on the elements of the list. – Owen S. Jan 05 '13 at 01:45
  • I disagree: There is no actual "initial left right pass" until the end. If there was, how could `foldr` work on infinite lists? `foldr (:) [] [1..]` runs perfectly fine and it constant memory - this could not be if the list was processed from right to left because there is no right end. – nh2 Jan 05 '13 at 11:46
  • 1
    Aha, so I wasn't confusing, I was just flat-out wrong! Thanks for the correction. See if the updated answer looks correct to you. – Owen S. Jan 05 '13 at 21:11
  • The initial claim in this answer is still wrong, as @newacct's answer shows. – Ørjan Johansen Jul 11 '15 at 03:25
4

This question is showing very well the difference between the IO Monad and other Monads. In the background the mapM builds an expression with a bind operation (>>=) between all the list elements to turn the list of monadic expressions into a monadic expression of a list. Now, what is different in the IO monad is that the execution model of Haskell is executing expressions during the bind in the IO Monad. This is exactly what finally forces (in a purely lazy world) something to be executed at all.

So IO Monad is special in a way, it is using the sequence paradigm of bind to actually enforce execution of each step and this is what our program makes to execute anything at all in the end. Others Monads are different. They have other meanings of the bind operator, depending on the Monad. IO is actually the one Monad which execute things in the bind and this is the reason why IO types are "actions".

The following example show that other Monads do not enforce execution, the Maybe monad for example. Finally this leds to the result that a mapM in the IO Monad returns an expression, which - when executed - executes each single element before returning the final value.

There are nice papers about this, start here or search for denotational semantics and Monads: Tackling the awkward squad: http://research.microsoft.com/en-us/um/people/simonpj/papers/marktoberdorf/mark.pdf

Example with Maybe Monad:

module Main where

fstMaybe :: [Int] -> Maybe [Int] fstMaybe = mapM (\x -> if x == 3 then Nothing else Just x)

main = do let r = fstMaybe [1..] return r

1

Let's talk about this in a more generic context.

As the other answers said, the mapM is just a combination of sequence and map. So the problem is why sequence is strict in certain Monads. However, this is not restricted to Monads but also Applicatives since we have sequenceA which share the same implementation of sequence in most cases.

Now look at the (specialized for lists) type signature of sequenceA :

sequenceA :: Applicative f => [f a] -> f [a]

How would you do this? You were given a list, so you would like to use foldr on this list.

sequenceA = foldr f b where ...
  --f :: f a -> f [a] -> f [a]
  --b :: f [a]

Since f is an Applicative, you know what b coule be - pure []. But what is f? Obviously it is a lifted version of (:):

(:) :: a -> [a] -> [a]

So now we know how sequenceA works:

sequenceA = foldr f b where
  f a b = (:) <$> a <*> b
  b = pure []

or

sequenceA = foldr ((<*>) . fmap (:)) (pure [])

Assume you were given a lazy list (x:_|_). The above definition of sequenceA gives

sequenceA (x:_|_) === (:) <$> x <*> foldr ((<*>) . fmap (:)) (pure []) _|_
                  === (:) <$> x <*> _|_

So now we see the problem was reduced to consider weather f <*> _|_ is _|_ or not. Obviously if f is strict this is _|_, but if f is not strict, to allow a stop of evaluation we require <*> itself to be non-strict.

So the criteria for an applicative functor to have a sequenceA that stops on will be the <*> operator to be non-strict. A simple test would be

const a <$> _|_ === _|_      ====> strict sequenceA
-- remember f <$> a === pure f <*> a

If we are talking about Moands, the criteria is

_|_ >> const a === _|_ ===> strict sequence
Earth Engine
  • 10,048
  • 5
  • 48
  • 78