1

I am trying to parse a limited set of valid strings which have a common prefix with attoparsec. However, My attempts result in either a Partial result or a premature Done:

{-# LANGUAGE OverloadedStrings #-}

import Control.Applicative
import qualified Data.Attoparsec.Text as PT

data Thing = Foobar | Foobaz | Foobarz

thingParser1 = PT.string "foobarz" *> return Foobarz
           <|> PT.string "foobaz" *> return Foobaz
           <|> PT.string "foobar" *> return Foobar

thingParser2 = PT.string "foobar" *> return Foobar
           <|> PT.string "foobaz" *> return Foobaz
           <|> PT.string "foobarz" *> return Foobarz

What I want is for "foobar" to result in Foobar, "foobarz" to result in Foobarz and "foobaz" to result in Foobaz. However

PT.parse thingParser1 "foobar"

results in a PT.Partial and

PT.parse thingParser2 "foobarz"

results in a PT.Done "z" Foobar.

sjp
  • 382
  • 4
  • 15
  • There are errors in your code, you have to import Control.Applicative and add a pure before the constructors in your parser or use the <$ operator and you probably want to derive show for Thing. – Noughtmare Jul 22 '21 at 17:09
  • You're right. I was a bit overzealous in making the example minimal. I have edited the question. – sjp Jul 23 '21 at 07:37

1 Answers1

0

As you see the order of alternatives matters in the parsec family of parser combinator libraries. It will first try the parser on the left and only continue with the parser on the right if that fails.

Another thing to notice is that your parsers don't require that the input ends after parsing. You can force that by using parseOnly instead of parse to run the actual parser. Or you can use the maybeResult or eitherResult functions to convert the Result into a Maybe or Either respectively.

That solution will work for thingParser1, but thingParser2 will still not work. This is because you need to have both the string parser and an endOfInput under a single try, this would work:

thingParser3 = Foobar  <$ PT.string "foobar"  <* endOfInput
           <|> Foobaz  <$ PT.string "foobaz"  <* endOfInput
           <|> Foobarz <$ PT.string "foobarz" <* endOfInput

A slightly better approach is to do a quick look ahead to see if an z follows the foobar, you can do that like this:

thingParser4 = Foobar  <$ (do
                 PT.string "foobar"
                 c <- peekChar
                 guard (maybe True (/= 'z') c))
           <|> Foobaz  <$ PT.string "foobaz"
           <|> Foobarz <$ PT.string "foobarz"

But this backtracking also degrades the performance, so I would stick with the thingParser1 implementation.

Noughtmare
  • 9,410
  • 1
  • 12
  • 38
  • Attoparsec does backtrack by default. – Benjamin Hodgson Jul 22 '21 at 20:32
  • @BenjaminHodgson thanks, it does indeed backtrack alternatives by default, however, it does not do full backtracking. I would expect to be able to factor out the three `endOfInput`s in `thingParser3` into a single sequened operation to the right, but then the backtracking will not work properly anymore. – Noughtmare Jul 22 '21 at 21:25
  • `thingParser1` with `parseOnly` does indeed fulfill my needs. Thanks! – sjp Jul 23 '21 at 07:37