0

For this project I'm parsing in two stages. The first stage handles include/ifdef/define directives and chunks the input up into [Span] items which define their start/end points in the original inputs along with the body text. This stream is then parsed by the second stage into my AST for subsequent processing.

Each element of the AST carries it's source position and any semantic error caught after parsing prints the correct error position regardless of include depth. This part is crucial since it comes after the stage that has the problem.

The problem is given a parse error in the second stage from an included file it reports a bogus error with a location at the top level rule in the input. A parse error in the initial file works fine. The presence of any directives will divide even the initial file into multiple chunks so it's not a 'single chunk' vs. 'multiple chunks' issue.

Given the fact that the AST is getting the locations correct I'm stumped as to how Megaparsec is reporting bad info when parse errors are encountered.

I'm included my stream instance and (set|get)(Position|Input) code since these seem like the relevant bits. i feel like there must be some bit of megaparsec housekeeping that I'm not doing or that my Stream instance is invalid for some reason.

data Span = Span
  { spanStart :: SourcePos
  , spanEnd   :: SourcePos
  , spanBody  :: T.Text
  } deriving (Eq, Ord, Show)

instance Stream [Span] where
  type Token  [Span] = Span
  type Tokens [Span] = [Span]
  tokenToChunk  Proxy = pure
  tokensToChunk Proxy = id
  chunkToTokens Proxy = id
  chunkLength   Proxy = foldl1 (+) . map (T.length . spanBody)
  chunkEmpty    Proxy = all ((== 0) . T.length . spanBody)

  positionAt1 Proxy pos (Span start _ _) = trace ("pos1" ++ show start) start
  positionAtN Proxy pos [] = pos
  positionAtN Proxy _ (Span start _ _:_) = trace ("posN" ++ show start) start

  advance1 Proxy _ _ (Span _ end _) = end
  advanceN Proxy _ pos [] = pos
  advanceN Proxy _ _ ts = let Span _ end _ = last ts in end

  take1_ []     = Nothing
  take1_ s      = case takeN_ 1 s of
                    Nothing -> Nothing
                    Just (sp, s') -> Just (head sp, s')

  takeN_ _ [] = Nothing
  takeN_ n s@(t:ts)
    | s == [] = Nothing
    | n <= 0 = Just ([t {spanEnd = spanStart t, spanBody = ""}], s)
    | n <  (T.length . spanBody) t = let (l, r) = T.splitAt n (spanBody t)
                                         sL = spanStart t
                                         eL = foldl (defaultAdvance1 (mkPos 3)) sL (T.unpack (T.tail l))
                                         sR = defaultAdvance1 (mkPos 3) eL (T.last l)
                                         eR = spanEnd t
                                         l' = [Span sL eL l]
                                         r' = (Span sR eR r):ts
                                     in Just (trace (show n) l', r')
    | n == (T.length . spanBody) t = Just ([t], ts)
    | otherwise = case takeN_ (n - T.length (spanBody t)) ts of
                     Nothing -> Just ([t], [])
                     Just (t', ts') -> Just (t:t', ts')


  takeWhile_ p s = fromJust $ takeN_ (go 0 s) s
    where go n s = case take1_ s of
                      Nothing -> n
                      Just (c, s') -> if p c
                                      then go (n + 1) s'
                                      else n

Find include and swap to it:

"include" -> do
     file <- between dquote dquote (many (alphaNumChar <|> char '.' <|> char '/' <|> char '_'))
     s    <- liftIO (Data.Text.IO.readFile file)
     p    <- getPosition
     i    <- getInput
     pushPosition p
     stack %= (:) (p, i)
     setPosition (initialPos file)
     setInput s

And if we reach the end of input pop stack and continue:

parseStream' :: StreamParser [Span]
parseStream' = concat <$> many p
   where p = do
          b <- tick <|> block
          end <- option False (True <$ hidden eof)
          h <- use stack
          when (end && (h /= [])) $ do
            popPosition
            setInput (h ^?! ix 0 . _2)
            stack %= tail
          return b
Brian Magnuson
  • 1,487
  • 1
  • 9
  • 15
  • I think you should better open issue on `megaparsec` github. To me it looks like better place to report errors. Also, `megaparsec` maintainer cares about his package, so you will receive decent answer there. – Shersh Feb 10 '18 at 20:50
  • @Shersh - Yes, I considered that. I thought it best to try here first though since I figured it more likely something I was doing wrong rather than a bug in megaparsec and didn't want to treat their github issue tracker as a support forum. – Brian Magnuson Feb 10 '18 at 20:59
  • It might not be exactly bug but lack of documentation. To me it's completely okay to ask for explanations and then enhance documentation. – Shersh Feb 10 '18 at 21:19

0 Answers0