0

Hi for the following code I have tried to run the RegModule monad such that one state will effect the next, as seen in the output at the end of the code (MVE) below the state of the monad doesn't seem to be updated as the increment only works for the first call to the monad rather than increment on each of the runRegModule calls. How can I ensure that its state get updated. And is there a way to do it that would be less manual such that the Monad would be called until it fails using forM or the like.

import qualified Data.Set as S

import Control.Monad

type CharSet = S.Set Char

data RE =
    RClass Bool CharSet

newtype RegModule d a =
  RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]}

instance Monad (RegModule d) where
  return a = RegModule (\_s _i d -> return (a, 0, d))
  m >>= f =
    RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
                            (b, j', d'') <- runRegModule (f a) s (i + j) d'
                            return (b, j + j', d''))

instance Functor (RegModule d) where fmap = liftM
instance Applicative (RegModule d) where pure = return; (<*>) = ap

scanChar :: RegModule d Char
scanChar = RegModule (\s i d ->
  case drop i s of
    (c:cs) -> return (c, 1, d)
    [] -> []
  )

regfail :: RegModule d a
regfail = RegModule (\_s _i d -> []
                )
regEX :: RE -> RegModule [String] ()
regEX (RClass b cs) = do
  next <- scanChar  
  if (S.member next cs)
    then return ()
    else regfail
 
runRegModuleThrice :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice matcher input startPos state =
  let (result1, pos1, newState1) = head $ runRegModule matcher input startPos state
      (result2, pos2, newState2) = head $ runRegModule matcher input pos1 newState1
      (result3, pos3, newState3) = head $ runRegModule matcher input pos2 newState2
  in [(result1, pos1, newState1), (result2, pos2, newState2), (result3, pos3, newState3)]

OUTPUT:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

Expected result:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),2,[]),((),3,[])]

EDIT in order to answer comments

There are several things about the suggestiosn in the comments that confuses me. Therefore I find it necessary to edit the post rather than just using the comment section to respond.

Furthermore I shall here try to directionalize the problem-solving in order to respond to especially the comments made with regards to the similarity of the current problem to the post How to use `runAccumT` , this also to open up for the possibility of the correct answer/solution being some answer/solution that might not be how I initially thought it to be, and hence why I shall also be open to that this answer/solution might not produce the output that I orginially expected.

As I understand the comments (and this might be a falacious understanding since I am still quite new at Haskell), they highlight and claim that my line newtype RegModule d a = RegModule {runRegModule :: String -> Int -> d -> [(a, Int, d)]} is the cause of the error, since it doesn't take into account an accumulated value, but just a single Int, when running the Monad RegModule. Therefore if the code above is to succeed I am to change it into something like newtype RegModule d a = RegModule {runRegModule :: String -> Int -> d -> [(a, Sum Int, d)]} instead since the input value isn't the Sum Int but rather just an Int which cannot accumulate values because it isn't a Monoidically value.

Some concerns about the comments at stake:

The problem I posted above including the Monad and the runner monad are code that I have from an assignment text. It should be correct. However it is supposed to be code that should make a basis for several code parts. That is, a regex parser and a Matcher (Interpreter). This code is not supposed to be changed. I am however not that skilled at Haskell so I thought it would be relevant to have some 'printable' or visible code that would show some output while I was building the remaining code. That is why I suggested the runRegModuleThrice to manually get a feeling for how values accumulated (e. g. while the parser ran over an input string).

It can be the case that when trying to make it possible to print the output, that I am somehow forcing the coding in a way that is deviating from what was inteded with the original assignment code for how a parser and matcher is supposed to be combined.

I am however not sure what to expect for a parser and whether or not the regEX module provided along with the other parts are supposed to give some list akin to the above mentioned expected outputs or perhaps something more similar to [((),1,['a']),((),2,['a']),((),3,['a'])], or not. Although I suspect they are.

Some follow up questions:

  1. If it is the case that I interpreted it correct with regards to the above about the lack of a Monoidically value, isn't much of the idea behind Monads lost including the ones that were depicted in the LYAH: http://learnyouahaskell.com/for-a-few-monads-more about the Writer Monad.

  2. Despite creating changes to the runner monad are there ways to show an accumulated value or 'appended text' as the output?

  3. Isn't it likely that there might be some way to archieve the outcome [((),1,['a']),((),2,['a']),((),3,['a'])] (or an even more 'correct' variant), by solely changing the regEx, scanChar, ``regfailand/orrunRegModuleThrice? Th is could perhaps be archieved by usingdo-notation for the runRegModuleThrice`, as I understand it the do notation is used for unpacking the value of a Monad and possibly using it in the next. However this doesn't seem to work.

  4. In general from examples I have encountered in LYAH it hasn't been stressed that the runner should be changed. I am a bit confused to why that is, and if it is something in a later release of Haskell.

Piskator
  • 605
  • 1
  • 9
  • 2
    Are you sure this is a monad? I don't think your instance obeys the monad laws. Also you don't use anything monadic in `runRegModuleThrice` (though it certainly looks like this could be done better with a state monad). Show the result you would have liked to get. – leftaroundabout Jul 01 '23 at 14:17
  • I am not sure it is a monad considering your points about the monad laws. I am still not skilled enough to know those by heart, so I can try to look them up an give a qualified guess. The reason why I made the output this manually as a first step is to just come up with some code I could understand. I will edit the post to show you what I expected of the `runRegModuleThrice` but didn't get. – Piskator Jul 01 '23 at 14:24
  • 1
    @leftaroundabout I think the monad instance is basically the one you get for `ReaderT String (AccumT (Sum Int) (StateT d []))`, so it should obey the laws. – Daniel Wagner Jul 01 '23 at 16:01
  • 3
    @Piskator You might like this question that doesn't seem related at first glance: [How to use `runAccumT`](https://stackoverflow.com/q/76591284/791604). The issue in that question is essentially the same as your issue in the end, I believe: you need a `runRegModule` wrapper that adds the incoming `Int` to the output `Int`. – Daniel Wagner Jul 01 '23 at 16:07
  • @DanielWagner but I don't understand ... is all of that extra parts really necessary in my cases. Isn't mine closer to the writer type monad depicted here: http://learnyouahaskell.com/for-a-few-monads-more : `instance (Monoid w) => Monad (Writer w) where` ... it seems they also change the state of one part of the resulting tupple. – Piskator Jul 03 '23 at 14:09
  • @Piskator Yes, `Writer` and `Accum` are closely related. You will need to decide which you want to use. I talked about `Accum` because it is the thing you are already using; you take an `Int` argument and use it to make the accumulated value visible to the user. If you don't need that visibility, then you can simplify your solution. – Daniel Wagner Jul 03 '23 at 14:59
  • @DanielWagner hmm.. I am trying to understand 1) why my own code poses the problem considering that the monad is defined with `j+j'` and that any `let a = 2; let b = 3` evaluates to `a+b =5` why it is expected that the `let`-statments in the code must save the variable 2) how the https://stackoverflow.com/q/76591284/791604 relates to my own problem, since this seems to be related to the definition of `look` and `AccumT ` (both predefined Monads) and not a custom made one as in my case. and 3) how to use the `Writer`-Monad in my impl. or some iterative that will work \w `RunRegModuleThrice`. – Piskator Jul 04 '23 at 13:32
  • @Piskator 1) Your monad is defined with `j+j'`, but your monad *runner* (`runRegModule`) isn't. 2) If you look at the implementation of `AccumT` and its `Monad` instance, you will see it is identical to your custom code, and the fix proposed by K. A. Buhr -- a wrapper around the runner -- identically applies here. 3) If you fix your runner, your existing `runRegModuleThrice` will work fine without modification. – Daniel Wagner Jul 04 '23 at 15:15
  • @DanielWagner , I have tried to edit my original post in order to respond to your comments. There are parts of your commnets that I might not understand fully, or that confuses me, and still so after looking into the Hoogle for the `AccumT` definitions etc. – Piskator Jul 05 '23 at 10:29

2 Answers2

1

The issue seems to be that the definition of scanChar is explicitly ignoring the input argument Int:

scanChar = RegModule (\s i d ->
  case drop i s of
    (c:cs) -> return (c, 1, d)
                      // ^ here we return 1, not i or any other expression which uses i
    [] -> []
  )

You are passing each newState as input to the next usage of runRegModule, however, this makes no difference, since the first thing you do in each call to regEX is ignore that newState.

Let's break down the execution of a function created by regEX, in particular focusing on the "index" value (I'm really not sure what this is supposed to represent, so maybe "index" is a bad name).

I'll rewrite it a bit for clarity. Start with this:

regEXMatchRClass b cs next = 
 if (S.member next cs)
    then return ()
    else regfail

regEX (RClass b cs) = scanChar >>= regEXMatchRClass b cs

and note that we can write these components in an equivalent way (with irrelevant parts omitted):

scanChar = RegModule (\s i d -> map (\x -> (x, 1, d)) ...)
regEXMatchRClass b cs next = RegModule (\s i d -> map (\x -> (x, i, d)) ...)

This form may help to convince you that this function regEX simply ignores the original input i which is fed to it.

Try instead something like this:

scanChar = RegModule (\s i d ->
  case drop i s of
    (c:cs) -> return (c, 1 + i, d)
    [] -> []
  )
user2407038
  • 14,400
  • 3
  • 29
  • 42
  • I am not sure what you mean with these two lines: `scanChar = RegModule (\s i d -> map (\x -> (x, 1, d)) ...)` and `regEXMatchRClass b cs next = RegModule (\s i d -> map (\x -> (x, i, d)) ...)`. Are you in a sense trying to work around a monad by including a `map`-function instead of the Monad instantiation in the beginning? – Piskator Jul 06 '23 at 06:08
  • As I understand your overall point you have rewritten the monadic functions by removing the syntatic sugar (`do`-notation). And by doing so you want to highlight how the `i` is ignored in the chaining between `scanChar` and `regEXMatchClass` since the former doesn't include `i` in the applied function/operation. So the latter only takes the 'fixed' output of the former as an input. – Piskator Jul 06 '23 at 06:09
  • When I use the `i+1` in `scanChar` instead it does produce the expected result however. – Piskator Jul 06 '23 at 06:14
1

I'll respectfully disagree with the other answer here. The underlying problem is that you've misinterpreted the output Int in the tuple (a, Int, d). For this monad, the input Int is the offset into the string, but the output Int is the number of characters scanned, not the new offset. This means that -- using the original code posted in the question -- your actual output is correct:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

because those output Ints are the number of characters scanned, not the new offsets. Since the matcher always matches a single character, if it matches anything at all, each of the three runs of the matcher should return 1, as in 1 character scanned.

However, your code is still buggy, because if you try a test case with the input string "aa", it still matches:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

The problem is that, in runRegModuleThrice, you are reimplementing the bind (>>=) operation for the monad but your implementation is incompatible with the actually "meaning" of the input and output Ints in the monad.

As I noted above, the definition for >>= in the monad instance is set up to treat the input Int as an offset into the string but the output Int as the number of characters scanned, and when the bind chains the left-hand-side to the right-hand-side, it adds this output Int (number of characters scanned) to the original input Int (starting offset) to get the new input Int (next offset). It also returns the sum of the two output Ints -- the number of characters scanned by the LHS plus the number of characters scanned by the RHS. If the output Ints were offsets, it would just return the final offset j' directly, not try to add it to another offset:

m >>= f =
  RegModule (\s i d -> do (a, j, d') <- runRegModule m s i d
  --            ^             ^                          ^
  --            |             |                          |
  --            |   output count of characters scanned   |
  --            |                                        |
  --            `-- starting offset as input ------------'

                          (b, j', d'') <- runRegModule (f a) s (i + j) d'
  -- new offset is *sum* of starting offset and number scanned -^^^^^

                          return (b, j + j', d''))
  --                                 ^^^^^^
  --        returned value is total number of characters scanned, j from
  --        the first, and j' from the second, **NOT** the new offset

Assuming the monad instance is correct, your original scanChar implementation was also correct. When it succeeds, it scans a single character, so it should always return 1 on success.

The bug is in your implementation of runRegModuleThrice which is misinterpreting the output Int as an offset. If you modify it to treat the input Ints as offsets and the output Ints as counts of characters scanned:

runRegModuleThrice :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice matcher input startPos state =
  let (result1, cnt1, newState1) = head $ runRegModule matcher input startPos state
      (result2, cnt2, newState2) = head $ runRegModule matcher input (startPos + cnt1) newState1
      (result3, cnt3, newState3) = head $ runRegModule matcher input (startPos + cnt1 + cnt2) newState2
  in [(result1, cnt1, newState1), (result2, cnt2, newState2), (result3, cnt3, newState3)]

then it should work correctly. Note that it STILL will return the output you claimed was wrong:

ghci> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),1,[]),((),1,[]),((),1,[])]

but this output is actually correct, as explained above.

It'll crash on an input of "aa":

λ> runRegModuleThrice (regEX (RClass False (S.singleton 'a'))) "aa" 0 []
[((),1,[]),((),1,[]),(*** Exception: Prelude.head: empty list
CallStack (from HasCallStack):
  error, called at libraries/base/GHC/List.hs:1646:3 in base:GHC.List
  errorEmptyList, called at libraries/base/GHC/List.hs:85:11 in base:GHC.List
  badHead, called at libraries/base/GHC/List.hs:81:28 in base:GHC.List
  head, called at Collect.hs:54:36 in main:Collect

but this is correct, too. Your runRegModuleThrice is written to assume that each run of the matcher succeeds and returns at least one match. When a run fails (like the third match failing here), it tries to take the head of an empty list and crashes, as designed.

I think what you were SUPPOSED to do here was implement runRegModuleThrice using the monad's implementation of >>=, not reimplement it from scratch, so maybe something more like:

runRegModuleThrice' :: RegModule d a -> String -> Int -> d -> [(a, Int, d)]
runRegModuleThrice' matcher = runRegModule (matcher >> matcher >> matcher)

which uses the monad bind (i.e., the definition of >>=) via the >> operator.

This does not return a "history" of the three matches. Instead, it returns the result of matching on the composite monadic match. This should return the a value of the final match, which is just () for your test case. It should also return the total number of character matched by the three matchers that are part of the composite match, which is the number 3, i.e., three matches of 1 character each.

And, in fact, it does just that:

λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "aaa" 0 []
[((),3,[])]

It also works correctly on strings that don't match without crashing:

λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "aa" 0 []
[]
λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "a" 0 []
[]
λ> runRegModuleThrice' (regEX (RClass False (S.singleton 'a'))) "bbb" 0 []
[]
K. A. Buhr
  • 45,621
  • 3
  • 45
  • 71
  • Thank you for your very thourough response. It is much appreciated. I think you are correct in your assessments and interpretations. – Piskator Jul 14 '23 at 05:49
  • One question. How come your `runRegModuleThrice'` takes 4 arguments like mine in the type signature, but in the actual function is only supplied with 1 argument? Is this currying? And if so, I thought currying would imply that you wouldn't need all the 4 arguments in the type signature, but just use whatever arguments the `runRegModule` would take within the function, based on currying. – Piskator Jul 14 '23 at 05:51
  • Yes, it boils down to currying. One way of looking at it is that the type `a -> b -> c -> d -> e` is the same as the type `a -> (b -> c -> d -> e)`. In words, a four argument function is the same as a one argument function that returns a three argument function. **Without any change to the type signature,** my *definition* for this function can either take four arguments and return a final `e`, or it can take one argument and return a three argument function (that returns `e`). In my definition, I used `runRegModule` in the body to create the three argument function I wanted. – K. A. Buhr Jul 14 '23 at 15:07
  • I have actually posted a new thread with some pacularities about your `runRegModuleThrice'` that eludes me with regards to the discard function: https://stackoverflow.com/questions/76694011/the-do-notation-in-haskell-the-and-the-operator-and-their-impact-on-the – Piskator Jul 15 '23 at 13:43