9

How can I use parsec to parse all matched input in a string and discard the rest?

Example: I have a simple number parser, and I can find all the numbers if I know what separates them:

num :: Parser Int
num = read <$> many digit

parse (num `sepBy` space) "" "111 4 22"

But what if I don't know what is between the numbers?

"I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."

many anyChar doesn't work as a separator, because it consumes everything.

So how can I get things that match an arbitrary parser surrounded by things I want to ignore?


EDIT: Note that in the real problem, my parser is more complicated:

optionTag :: Parser Fragment
optionTag = do
    string "<option"
    manyTill anyChar (string "value=")
    n <- many1 digit
    manyTill anyChar (char '>')
    chapterPrefix
    text <- many1 (noneOf "<>")
    return $ Option (read n) text
  where
    chapterPrefix = many digit >> char '.' >> many space
Sean Clark Hess
  • 15,859
  • 12
  • 52
  • 100
  • 2
    By the way, do you really need to choose Parsec for this task? Could you use a simple regular expression like `^$`? – Yuuri Apr 10 '15 at 08:22

4 Answers4

8

For an arbitrary parser myParser, it's quite easy:

solution = many (let one = myParser <|> (anyChar >> one) in one)

It might be clearer to write it this way:

solution = many loop
    where 
        loop = myParser <|> (anyChar >> loop)

Essentially, this defines a recursive parser (called loop) that will continue searching for the first thing that can be parsed by myParser. many will simply search exhaustively until failure, ie: EOF.

AJF
  • 11,767
  • 2
  • 37
  • 64
  • This answer is close, but you have to backtrack `myParser` if it consumes some input and fails, as @neil-smith notes in his answer. Also you have to consider if `myParser` succeeds without consuming any input, because then this will loop forever. [`sepCap`](https://hackage.haskell.org/package/replace-megaparsec/docs/Replace-Megaparsec.html#v:sepCap) will handle these and other subtleties. – James Brock Aug 28 '19 at 01:01
2

You can use

 many ( noneOf "0123456789")

i'm not sure about "noneOf" and "digit" types but you can give e try also to

many $ noneOf digit
Gabriel Ciubotaru
  • 1,042
  • 9
  • 22
  • 1
    I guess you need to do it after you extract the – Gabriel Ciubotaru Apr 10 '15 at 09:28
  • That's very useful information. I wasn't sure if parsed was the right tool for this. Thanks for your help! – Sean Clark Hess Apr 10 '15 at 13:43
  • This is fine and dandy, but working for arbitrary parsers is more important in my eyes. – AJF Apr 10 '15 at 17:39
2

To find the item in the string, the item is either at the start of the string, or consume one character and look for the item in the now-shorter string. If the item isn't right at the start of the string, you'll need to un-consume the characters used while looking for it, so you'll need a try block.

hasItem = prefixItem <* (many anyChar)
preafixItem = (try item) <|> (anyChar >> prefixItem)
item = <parser for your item here>

This code looks for just one occurrence of item in the string.

(AJFarmar almost has it.)

Neil Smith
  • 313
  • 3
  • 10
1

The replace-megaparsec package allows you to split up a string into sections which match your pattern and sections which don't match by using the sepCap parser combinator.

import Replace.Megaparsec
import Text.Megaparsec
import Text.Megaparsec.Char

let num :: Parsec Void String Int
    num = read <$> many digitChar
>>> parseTest (sepCap num) "I will live to be 111 years <b>old</b> if I work out 4 days a week starting at 22."
[Left "I will live to be "
,Right 111
,Left " years <b>old</b> if I work out "
,Right 4
,Left " days a week starting at "
,Right 22
,Left "."
]
James Brock
  • 3,236
  • 1
  • 28
  • 33