4

I am reading Programming In Haskell, in the 8th chapter, the author gives an example of writing parsers. The full source is here: http://www.cs.nott.ac.uk/~gmh/Parsing.lhs I can't understand the following part: many permits zero or more applications of p, whereas many1 requires at least one successful application:

many        ::    Parser a → Parser [a ]
many p      =     many1 p +++ return [ ]
many1       ::    Parser a → Parser [a ]
many1 p     = do v ← p
                 vs ← many p
                 return (v : vs)

How the recursive call happens at

vs <- many p

vs is the result value of many p, but many p called many1 p, all many1 has in its definition is a do notation, and again has result value v, and vs, when does the recursive call return? Why does the following snippet can return [("123","abc")] ?

> parse (many digit) "123abc"
[("123", "abc")]
Sawyer
  • 15,581
  • 27
  • 88
  • 124

3 Answers3

6

The recursion stops at the v <- p line. The monadic behavior of the Parser will just propagate a [] to the end of the computation when p cannot be parsed anymore.

p >>= f =  P (\inp -> case parse p inp of
                        []        -> [] -- this line here does not call f
                        [(v,out)] -> parse (f v) out)

The second function is written in do-notation, which is just a nice syntax for the following:

many1 p = p >>= (\v -> many p >>= (\vs -> return (v : vs)))

If parsing p produces an empty list [] the function \v -> many p >>= (\vs -> return (v : vs)) will not be called, stopping the recursion.

R. Martinho Fernandes
  • 228,013
  • 71
  • 433
  • 510
  • I am sorry, I've just read the 8th chapter of this book, don't know what's monadic behavior yet, can you explain it more easily to understand? – Sawyer May 21 '11 at 10:10
  • 1
    @Sawyer: I made an effort to explain what's happening without delving much into what the "monadic" is, which seems to be a confusing topic to many, and I'm sure your book will do a better job at explaining it when the time comes. In case it turns how it doesn't, bookmark [this link](http://blog.sigfpe.com/2006/08/you-could-have-invented-monads-and.html). – R. Martinho Fernandes May 21 '11 at 10:12
  • In a nutshell, the pattern matching to stop the recursion is in the "choice" function `+++` defined on page 78. As `many` calls `+++` its there where the recursion stops. – stephen tetley May 21 '11 at 10:41
  • This is an old question, however I found myself stuck at the same problem. Understanding many and many1... I thought I would try to follow the program using small input like "a" and finding out at what part the empty string would be produced when calling many and why it wouldn't be produced on calling many1? – Marin Nov 14 '14 at 10:07
2

For the last question:

> parse (many digit) "123abc"
[("123", "abc")]

Means that parsing has been successful as at least one result has been returned in the answer list. Hutton parsers always return a list - the empty list means parsing failure.

The result ("123", "abc") means that parsing has found three digits "123" and stopped at 'a' which is not a digit - so the "rest of the input" is "abc".

Note that many means "as many as possibly" not "one or more". If it were "one or more" you'd get this result instead:

[("1", "23abc"), ("12", "3abc"), ("123", "abc")]

This behaviour wouldn't be very good for deterministic parsing, though it might sometimes be needed for natural language parsing.

stephen tetley
  • 4,465
  • 16
  • 18
  • The result is [("123","abc")], from the return statement, I guess finally, `v` is "123" and `vs` is "abc", that way `v:vs` could yield [("123","abc")], but there isn't any manipulations on v or vs, where does the "123",and "abc" come from? – Sawyer May 21 '11 at 12:40
  • The return statement in the definition of `many1` manipulates the `v` and the `vs` - consing them to form a list `(v : vs)` - you should really edit your question so the return statement is aligned within the do-block. As you have three digits in the input and `many` provides the empty list, you get `('1': ('2': ('3' : [])))` which is the string "123". "abc" is the rest of input that hasn't been parsed. – stephen tetley May 21 '11 at 14:42
  • Note also that the `item` parser on page 76 is the the function that actually pulls apart the input string. The function `digit` on page 79 is built with the function `sat` (satisfies) on page 78 which is the one that actually calls `item`. This is the "combinator" style of programming where functions are built with functions that take other functions as arguments. Admittedly, it can be difficult to track the control-flow for programs in this style until you have gained a fair amount of experience. – stephen tetley May 21 '11 at 14:54
1

Let me strip this down to the barest bones to make absolutely clear why do-blocks can be misunderstood if they're read simply as imperative code. Consider this snippet:

doStuff :: Maybe Int
doStuff = do
    a <- Nothing
    doStuff

It looks like doStuff will recurse forever, after all, it's defined to do a sequence of things ending with doStuff. But the sequence of lines in a do-block is not simply a sequence of operations that is performed in order. If you're at a point in a do-block, the way the rest of the block is handled is determined by the definition of >>=. In my example, the second argument to >>= is only used if the first argument isn't Nothing. So the recursion never happens.

Something similar can happen in many different monads. Your example is just a little more complex: when there are no more ways to parse something, the stuff after the >>= is ignored.

sigfpe
  • 7,996
  • 2
  • 27
  • 48