The do-notation in Haskell, the >> and the >>= operator and their impact on the final Monad

Question

I am trying to understand Monads and have read and played around with concept from LYAH especially the 'walk the line section': http://learnyouahaskell.com/a-fistful-of-monads#walk-the-line in order to understand how the >>-operator and discarding work.

There does still seem to be some things with regards to the >>= vs. >> that I haven't grasped. At least in the code/MVE below I get different results depending on which of the runRegModuleOne to runRegModuleThree that I use. And I am not entirely sure as to why they behave as they do in all cases, as I point out below.

import qualified Data.Set as S

import Control.Monad

type CharSet = S.Set Char

data RE =
    RClass Bool CharSet
  | RSeq [RE]
  | RAlt RE RE

newtype RegModule d a =
  RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]}

instance Monad (RegModule d) where
  return a = RegModule (\_s _i d -> return (a, 0, d))
  m >>= f =
    RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
                            (b, j', d'') <- runRegModule (f a) s (i + j) d'
                            return (b, j + j', d''))

instance Functor (RegModule d) where fmap = liftM
instance Applicative (RegModule d) where pure = return; (<*>) = ap

scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
  case drop i s of
    (c:cs) -> return (c, 1, d)
    [] -> []
  )

--For getData we simply rreturn the current state without modifying it
getData :: RegModule d d
getData = RegModule (\_s _i d -> [(d, 0, d)])

--The function just returns and empty state and puts that d in data.
putData :: d -> RegModule d ()
putData d = RegModule (\_s _i _d -> [((), 0, d)]) 

--The function returns an eempty list as failure a bit like pfail in readP.
regfail :: RegModule d a
regfail = RegModule (\_s _i d -> []
                )

choice :: [RegModule d a] -> RegModule d a
choice [] = regfail
choice (m:ms) = RegModule (\s i d -> 
                      case (runRegModule m s i d) of
                        []   -> (runRegModule (choice ms) s i d)
                        succ -> succ
                      )


regEX :: RE -> RegModule [String] ()
regEX (RClass b cs) = do
  next <- scanChar   
  if (S.member next cs) && not b
    then do         
        d <- getData
        putData (d ++ [[next]])        
    else regfail

regEX (RAlt r1 r2) =
  do    
    let m1 = regEX r1
        m2 = regEX r2
    choice [m1, m2]

regEX (RSeq []) = regfail
regEX (RSeq (r:rs)) = do
  case length (rs) of
    0 -> regEX r
    _tail ->  do
              regEX r
              regEX (RSeq rs)

runRegModuleOne :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleOne matcher input startPos state =
  let (result1, pos1, newState1) = head $ runRegModule matcher input startPos state
      (result2, pos2, newState2) = head $ runRegModule matcher input pos1 (newState1)
      (result3, pos3, newState3) = head $ runRegModule matcher input pos2 (newState2)
      (result4, pos4, newState4) = head $ runRegModule matcher input pos3 (newState3)
  in [(result1, pos1, newState1), (result2, pos2, newState2), (result3, pos3, newState3), (result4, pos4, newState4)]

runRegModuleTwo :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleTwo matcher input startPos state = do
  (result1, pos1, newState1) <- runRegModule matcher input startPos state
  (result2, pos2, newState2) <- runRegModule matcher input pos1 newState1
  (result3, pos3, newState3) <- runRegModule matcher input pos2 newState2
  (result4, pos4, newState4) <- runRegModule matcher input pos3 newState3
  return (result4, pos4, newState4)


runRegModuleThree :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThree matcher = runRegModule (matcher >> matcher >> matcher >> matcher)

--helper functions for ease of using ghci

runRegModuleThree_helper = runRegModuleThree (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b')),(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []

OUTPUT for using RSeq with the three:

ghci> runRegModuleOne (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []
[((),2,["a","b"]),((),2,["a","b","a","b"]),((),2,["a","b","a","b","a","b"]),((),2,["a","b","a","b","a","b","a","b"])]

ghci> runRegModuleTwo (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []
[((),2,["a","b","a","b","a","b","a","b"])]

ghci> runRegModuleThree (regEX (RSeq [(RClass False (S.singleton 'a')), (RClass False (S.singleton 'b'))])) "abab" 0 []  
[]

ghci> runRegModuleThree_helper 
[]

Note that "One" and "Two" seems to be successful after a fashion since they are expected to store the letters they match when they find a sequence of letters, it might have been even more correct had they had only 4 letters saved rather than 8, alghough I am unsure of that part, but suspect it is because 4 runs are being made of the RSeq. Further more "One" saves the history of all the runs.

Why does runRegModuleThree seem to fail though, even when I leave out the regEX (RSeq []) = regfail ?

In comparison runRegModuleThree seems to be the only correct version when using RAlt:

ghci> runRegModuleOne (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abab" 0 []
[((),1,["a"]),((),1,["a","b"]),((),1,["a","b","b"]),((),1,["a","b","b","b"])]

ghci> runRegModuleTwo (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abab" 0 []    
[((),1,["a","b","b","b"])]

ghci> runRegModuleThree (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abab" 0 []  
[((),4,["a","b","a","b"])]

--Other parcularities for runRegModuleOne and Two

ghci> runRegModuleOne (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abaa" 0 []   
[((),1,["a"]),((),1,["a","b"]),((),1,["a","b","b"]),((),1,["a","b","b","b"])]

ghci> runRegModuleTwo (regEX (RAlt (RClass False (S.singleton 'a')) (RClass False (S.singleton 'b')))) "abaa" 0 []     
[((),1,["a","b","b","b"])]

Here it seems that "One" and "Two" stores a 'wrong' letter after t he first, whereas this isn't the case for "Three". For "One and "Two" this is also the case for the input "abaa".

But why doesn't this happen for "Three"? As I understand it the 'discard' doesn't discard the values within the Monad but just corresponds to m >> n = m >>= \_ -> n such that no function is applied to the value within n. In this sense the data stored in the monad due to putData and putData (d ++ [[next]]) are not discarded, since it is a side-effect.

How can one make runRegModuleThree work correctly for both cases of the regEx patternmatch depicted here?

don't want to sound rude but It seems you want to learn monads a lot but you are using a very rare path to learn them. First your `Functor` and `Applicative` instances are implemented using the `Monad` one. For the sake of learning It would be much much better to implement those by yourself. Also, It would help to use a much simpler example; from your code It looks like you are willing to use some sort of `State + Writer + []` monad, leading to a very complex and verbose implementation, you also share a lot of code which apparently has nothing to do with monads `runRegModuleTwo`, etc... — lsmor, Jul 17 '23 at 10:00
@Ismor , thank you for pointing this out. Your comment is much appreciated and not considered rude at all, and someone pointed something similar out in another post. As you state, I do want to learn monads a lot. In order to learn them I have simply followed the two chapters in LYAH about it and then tried to solve the exercises in 'You could have invented Monads'. These are about the only exercises I know besidesthe implementation from the above code that I have come across, and I have scouted the Haskell wiki for more exercises but have so far not found any that only considers monads. — Piskator, Jul 22 '23 at 04:31
@Ismor, So if you have any ideas about where I can find any exercises related specifically to monads I am much interested. — Piskator, Jul 22 '23 at 04:52
I have written a [monad excersise](https://github.com/lsmor/snake-fury) myself. It is about implementing the nokia-snake game in haskell. You might find it usefull. (Little bit of self promotion :P). — lsmor, Jul 22 '23 at 07:26

The do-notation in Haskell, the >> and the >>= operator and their impact on the final Monad

0 Answers0

Linked