0

I want to write code doing something like C preprocessing. So I looked for libraries and got two candidates, attoparsec, megaparsec.

I need the feature reporting error position and megaparsec already has that. But attoparsec would be desirable for performance.

If I add the error position feature to attoparsec's Parser monad, then should I have to wrap it up in StateT transformer and lift all that library's function when I use them? I think it's tiresome work. Is there any better method?

EDIT

I will adopt megaparsec which is appropriate to this situation. But I still want to know how can I wrap attoparsec's Parser monad. Is there anyone could tell me whether the method I mentioned above is the best one or not?

I want to know just the monad wrapping method. In other words, whether lifting all inner monad function is the only solution or not.

jeiea
  • 1,965
  • 14
  • 24
  • I actually thought that the lack of overhead related to error messages was a key component of the improved performance. If you are going to add that overhead back in, why not just use a library that already does it (and may be better optimized than your implementation)? – ryachza Jan 30 '17 at 16:49
  • Yeah, I think you're right. But I'm curious about the title nevertheless. If I can't use `megaparsec` for some reason? Or if I encounter similar situation sometime? – jeiea Jan 30 '17 at 19:31
  • I'm not sure what wrapping `Parser` in `StateT` would gain you, since you would have to somehow maintain the state in between `attoparsec` calls? – ryachza Jan 30 '17 at 20:17
  • I thought I could embed the parsing position, when I think again it is not. Now it is a just parser with a state. – jeiea Jan 30 '17 at 20:29
  • Right so I think `Parser` throws away some information in the interest of performance (at the expense of usability). I think it would be quite tedious to try to recover that externally, you would probably be better off recreating the primitives. If there's any question about whether you need maximal speed, I would say use something more user friendly. – ryachza Jan 30 '17 at 20:37
  • I can't understand what is external / recreating primitives. So can't I avoid implementation complexity? – jeiea Jan 30 '17 at 21:15
  • 1
    @jeiea A little context that should hopefully help you make a decision: `attoparsec` is designed for high-performance low-level applications with a focus on byte-strings: networking protocols, binary file formats, that sort of thing. `attoparsec` trades some flexibility (and high-quality error-reporting) for speed. `megaparsec` is designed for user-facing stuff like compiler front-ends where very high performance is less of a concern. If you're writing a preprocessor I'd definitely go for `megaparsec`. – Benjamin Hodgson Jan 30 '17 at 23:17
  • @jeiea Per your edit, I think you could wrap `Parser` in `StateT`, but I think that would achieve something fundamentally different from what you're looking for. You would be able to manipulate the state in between and based on the result of parse calls, and use the state to determine what parses to try, but you wouldn't be able to recover position information unless you try to maybe discern how much was consumed based on the result of a parse? – ryachza Jan 31 '17 at 13:30
  • Yeah, that's what I mean when I wrote _when I think again it is not_. Can't I question without a practical example? – jeiea Jan 31 '17 at 14:56
  • @jeiea As for the lifting, you might be able to create a `newtype` and maybe derive (or implement) `MonadState` and define things in terms of that type. I think asking without a practical example is fine, but mixing it in with a practical question became confusing. If you specifically want to know about using monad transformers without having to lift every operation, you could research that and ask any questions you may have, but I don't think `Parser` is unique since the whole point of a monad transformer is that it should work with *any* `Monad`. – ryachza Jan 31 '17 at 19:05

1 Answers1

2

You can get the current parse position from attoparsec, without needing a transformer. But there is no exported function to do it; you have to define it yourself:

import qualified Data.Attoparsec.Internal.Types as T

offset :: T.Parser i T.Pos
offset = T.Parser $ \t pos more lose succ -> succ t pos more pos

Example usage:

λ> parseOnly (many' (skipMany (word8 46) *> offset <* anyWord8)) ".a..a...a....a"
Right [Pos {fromPos = 1},Pos {fromPos = 4},Pos {fromPos = 8},Pos {fromPos = 13}]

This works as expected for incremental input, too. It only gives you the offset into the input, not (line, column), but the offset is sufficient for many applications.

Use fromPos to get the Int from a Pos:

λ> T.fromPos <$> parseOnly offset ""
Right 0

Now, we can use offset to create a parser that reports the current offset when it fails.

reportOffsetOnError :: T.Parser i a -> T.Parser i a
reportOffsetOnError p =
  p <|> (offset >>= \pos ->
    fail ("failed at offset: " ++ show (T.fromPos pos)))

Example usage:

λ> parseOnly (word8 46 *> word8 46 *> reportOffsetOnError (word8 97)) "..a"
Right 97
λ> parseOnly (word8 46 *> word8 46 *> reportOffsetOnError (word8 97)) "..b"
Left "Failed reading: failed at offset: 2"

A final note: Data.Attoparsec.Zepto does provide the ZeptoT transformer if you really need a transformer and want to stay with the attoparsec package, but this is a different parser type from the main parser in attoparsec.

frasertweedale
  • 5,424
  • 3
  • 26
  • 38