4

I don't understand why the following code behaves the way it does:

myand :: Bool -> Bool -> Bool
myand True True = True
myand _ _ = False

containsAandB :: String -> IO Bool
containsAandB s = do
  containsA <- do
    putStrLn $ "Check if 'a' is in " ++ s
    return $ 'a' `elem` s
  containsB <- do
    putStrLn $ "Check if 'b' is in " ++ s
    return $ 'b' `elem` s
  return $ containsA `myand` containsB

This is the output when I test the function:

*Main> containsAandB "def"
Check if 'a' is in def
Check if 'b' in in def
False

Note that (&&) behaves just like 'myand', I just wrote a custom function to better visualize what's happeninig. I'm surprised about the 'Check if 'b' is in def part since containsA is already false so 'myand' can be evaluated regardless of containsB.

Question 1:
Is there a particular reason why containsB has to be evaluated too? My understanding was that the containsB <- do ... statement is not evaluated unless it's required but my guess is that this might behave differently since it's IO and therefore not free of side effects?

Question2:
What's the best practice approach to get the desired behavior (if containsA is false, containsB is not checked) without dealing with nested if-else clauses?

ryan91
  • 154
  • 8
  • Well you wrap it in an `IO` monad, so due to the monad. – Willem Van Onsem Jun 10 '18 at 11:20
  • Okay, so my assumption was correct that it behaves this way because it's not side-effect free? I'm so sorry but this comment is a bit too vague for me to understand. – ryan91 Jun 10 '18 at 11:30
  • 3
    Well imagine that your two elements `containsA` and `containsB` read from a file. Then they advance the cursor. Now that means that depending of the value of `containsA` the cursor will (or will not) advance? That is behavior one typically wants to avoid. So the `IO` actions are guaranteed to take place if they are in the monad. – Willem Van Onsem Jun 10 '18 at 11:31

2 Answers2

7

Question 1: Is there a particular reason why containsB has to be evaluated too? My understanding was that the containsB <- do ... statement is not evaluated unless it's required but my guess is that this might behave differently since it's IO and therefore not free of side effects?

Your experiment is flawed because you perform IO. One of the important aspects of IO is that the order of IO statements is respected. So even if due to lazyness, we do not need a certain value, the IO part is executed.

This is logical in the world of IO: imagine that we read a file, and we have three parts. We read the first two parts, and then we read the third one. But now imagine that due to laziness, the second IO command is never executed. Then that would mean that third part actually reads the second part of the file.

So in short due to IO, the statements are evaluated. But only the IO statements. So the value wrapped inside the return is not evaluated, unless you need it. The check 'b' `elem` s only happens when we need it.

There are however ways to "trick" the IO out of this. For example trace (from the Debug.Trace) module will perform "unsafe IO": it will print the error message given it is evaluated. If we write:

Prelude> import Debug.Trace
Prelude Debug.Trace> myand (trace "a" False) (trace "b" False)

we got:

Prelude Debug.Trace> myand (trace "a" False) (trace "b" False)
a
False

Question2: What's the best practice approach to get the desired behavior (if containsA is false, containsB is not checked) without dealing with nested if-else clauses?

Well as said before, normal behavior is that containsB is not evaluated. But if you perform IO actions, those have to be performed before you actually do the checking. This is bascially one of the aspects that the (>>=) operator for IO (you use this operator implcitly in a do block) handles.

Willem Van Onsem
  • 443,496
  • 30
  • 428
  • 555
  • Gotcha. Thank you! – ryan91 Jun 10 '18 at 11:35
  • The terminology is all wrong. There is no "an `IO` monad". The `IO` type itself is a monad (`IO` is also a functor). When you have a `do` block, that's not a monad; that's just a value of type `IO something`. – melpomene Jun 10 '18 at 11:39
  • 1
    @melpomene: well not really, a `do` block can lead to any `Monad m => m something`. It is only the type checking that convert it to a specific `m`. If at all, for example `liftM2 h f g = do {x <- f; y <- g; return h x y}` has a type `Monad m => (a -> b -> c) -> m a -> m b -> m c`. – Willem Van Onsem Jun 10 '18 at 11:44
  • Yes, I was talking about `do` blocks doing I/O specifically. But even in general, the `do` block itself is not "a monad"; the "monad" part is specifically the `m` in the type signature. – melpomene Jun 10 '18 at 11:47
  • @melpomene: yes, I know, but it makes it a little bit "chaotic" to talk about monads. So typically in discussions with friends, collegues, etc. I use "shortcuts", although those are understood by collegues, I guess it can be confusing :). – Willem Van Onsem Jun 10 '18 at 11:49
  • I think these shortcuts might actually make it harder to explain why the versions in my answer _do_ work :) – Alexey Romanov Jun 10 '18 at 14:52
  • @AlexeyRomanov: well they should have defenestrated the person who introduced things like `State` and `IO`, since actually those things are not `State` or `IO`, they describe *changes* in states and IO. I think I would probably explain it as "*defining two `IO`s, and one is guarded by* encoding *the function into an outher `IO`*". So basically guarding the IO. – Willem Van Onsem Jun 10 '18 at 14:57
3

do blocks get translated into calls to >>= and >>. In particular, your code becomes (unless I missed some parentheses)

containsAandB s = 
  (putStrLn $ "Check if 'a' is in " ++ s >>
   return $ 'a' `elem` s) >>= (\containsA ->
  (putStrLn $ "Check if 'b' is in " ++ s >>
   return $ 'b' `elem` s) >>= (\containsB ->
  return $ containsA `myand` containsB))

So containsB <- do ... isn't really a statement; it makes the do ... part the first argument to a >>= call. And >>= (and >>) for IO is defined so it always runs its first argument. So to get to the last return $ ..., both putStrLn calls already must have run.

This behavior isn't limited to the IO monad; e.g. see Difference between Haskell's Lazy and Strict monads (or transformers).

What's the best practice approach to get the desired behavior (if containsA is false, containsB is not checked) without dealing with nested if-else clauses?

You can deal with them once and for all:

andM :: (Monad m) => m Boolean -> m Boolean -> m Boolean
andM m1 m2 = do
  x <- m1
  case x of
    True -> m2
    False -> return False

containsAandB s = andM
  (do
    putStrLn $ "Check if 'a' is in " ++ s
    return $ 'a' `elem` s)
  (do
    putStrLn $ "Check if 'b' is in " ++ s
    return $ 'b' `elem` s)

or

containsAandB :: String -> IO Bool
containsAandB s = do
  let containsA = do
    putStrLn $ "Check if 'a' is in " ++ s
    return $ 'a' `elem` s
  let containsB = do
    putStrLn $ "Check if 'b' is in " ++ s
    return $ 'b' `elem` s
  containsA `andM` containsB

(andM is in http://hackage.haskell.org/package/extra-1.6.8/docs/Control-Monad-Extra.html as (&&^), along with other similar functions).

Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487