Haskell parser combinator - do notation

Question

I was reading a tutorial regarding building a parser combinator library and i came across a method which i don't quite understand.

newtype Parser a = Parser {parse :: String -> [(a,String)]}

chainl :: Parser a -> Parser (a -> a -> a) -> a -> Parser a
chainl p op a = (p `chainl1` op) <|> return a

chainl1 :: Parser a -> Parser (a -> a -> a) -> Parser a
p `chainl1` op = do {a <- p; rest a}
  where rest a = (do f <- op
                     b <- p
                     rest (f a b))
                 <|> return a

bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s

the bind is the implementation of the (>>=) operator. I don't quite get how the chainl1 function works. From what I can see you extract f from op and then you apply it to f a b and you recurse, however I do not get how you extract a function from the parser when it should return a list of tuples?

Look at the type for `chainl`. The second argument, `op`, is a `Parser (a -> a -> a)`, which *is* a parser that produces a function. — Alexis King, Aug 16 '16 at 23:36
You do not 'extract' the function, the `do` notation is sugar for uses of `>>=` in which you can access the result of a `Parser` within another `Parser`. The only way to actually get something of type `a` out of `Parser a` is to apply the function to a string (and hope the list is non-empty), but you do not need to do this (and should not) to manipulate parsers. — user2407038, Aug 17 '16 at 03:05

liminalisht · Answer 1 · 2016-08-17T03:53:09.673

Start by looking at the definition of Parser:

newtype Parser a = Parser {parse :: String -> [(a,String)]}`

A Parser a is really just a wrapper around a function (that we can run later with parse) that takes a String and returns a list of pairs, where each pair contains an a encountered when processing the string, along with the rest of the string that remains to be processed.

Now look at the part of the code in chainl1 that's confusing you: the part where you extract f from op:

f <- op

You remarked: "I do not get how you extract a function from the parser when it should return a list of tuples."

It's true that when we run a Parser a with a string (using parse), we get a list of type [(a,String)] as a result. But this code does not say parse op s. Rather, we are using bind here (with the do-notation syntactic sugar). The problem is that you're thinking about the definition of the Parser datatype, but you're not thinking much about what bind specifically does.

Let's look at what bind is doing in the Parser monad a bit more carefully.

bind :: Parser a -> (a -> Parser b) -> Parser b
bind p f = Parser $ \s -> concatMap (\(a, s') -> parse (f a) s') $ parse p s

What does p >>= f do? It returns a Parser that, when given a string s, does the following: First, it runs parser p with the string to be parsed, s. This, as you correctly noted, returns a list of type [(a, String)]: i.e. a list of the values of type a encountered, along with the string that remained after each value was encountered. Then it takes this list of pairs and applies a function to each pair. Specifically, each (a, s') pair in this list is transformed by (1) applying f to the parsed value a (f a returns a new parser), and then (2) running this new parser with the remaining string s'. This is a function from a tuple to a list of tuples: (a, s') -> [(b, s'')]... and since we're mapping this function over every tuple in the original list returned by parse p s, this ends up giving us a list of lists of tuples: [[(b, s'')]]. So we concatenate (or join) this list into a single list [(b, s'')]. All in all then, we have a function from s to [(b, s'')], which we then wrap in a Parser newtype.

The crucial point is that when we say f <- op, or op >>= \f -> ... that assigns the name f to the values parsed by op, but f is not a list of tuples, b/c it is not the result of running parse op s.

In general, you'll see a lot of Haskell code that defines some datatype SomeMonad a, along with a bind method that hides a lot of the dirty details for you, and lets you get access to the a values you care about using do-notation like so: a <- ma. It may be instructive to look at the State a monad to see how bind passes around state behind the scenes for you. Similarly, here, when combining parsers, you care most about the values the parser is supposed to recognize... bind is hiding all the dirty work that involves the strings that remain upon recognizing a value of type a.

"that assigns the name f to the values parsed by op", what do you mean by this, is this not the list of tuples? I am slightly confused — Yusuf, Aug 17 '16 at 11:29
No it's not the list of tuples. Since `op` is of type `Parser (a -> a -> a)`, `f` is a fn (recognized by the parser) of type `a -> a -> a`. If a parser `p` is of type `Parser Int`, then whenever I write `x <- p`, `x` will refer to some integer recognized by `p`. This is what we want it to refer to, right? This way we can easy combine multiple parsers into new parsers in arbitary ways that depend on the `x` value recognized by `p`. `bind` is hiding some complexity so that you don't have to deal with it / think about it. — liminalisht, Aug 17 '16 at 14:16

Haskell parser combinator - do notation

1 Answers1