For this project I'm parsing in two stages. The first stage handles include/ifdef/define directives and chunks the input up into [Span]
items which define their start/end points in the original inputs along with the body text. This stream is then parsed by the second stage into my AST for subsequent processing.
Each element of the AST carries it's source position and any semantic error caught after parsing prints the correct error position regardless of include depth. This part is crucial since it comes after the stage that has the problem.
The problem is given a parse error in the second stage from an included file it reports a bogus error with a location at the top level rule in the input. A parse error in the initial file works fine. The presence of any directives will divide even the initial file into multiple chunks so it's not a 'single chunk' vs. 'multiple chunks' issue.
Given the fact that the AST is getting the locations correct I'm stumped as to how Megaparsec is reporting bad info when parse errors are encountered.
I'm included my stream instance and (set|get)(Position|Input) code since these seem like the relevant bits. i feel like there must be some bit of megaparsec housekeeping that I'm not doing or that my Stream instance is invalid for some reason.
data Span = Span
{ spanStart :: SourcePos
, spanEnd :: SourcePos
, spanBody :: T.Text
} deriving (Eq, Ord, Show)
instance Stream [Span] where
type Token [Span] = Span
type Tokens [Span] = [Span]
tokenToChunk Proxy = pure
tokensToChunk Proxy = id
chunkToTokens Proxy = id
chunkLength Proxy = foldl1 (+) . map (T.length . spanBody)
chunkEmpty Proxy = all ((== 0) . T.length . spanBody)
positionAt1 Proxy pos (Span start _ _) = trace ("pos1" ++ show start) start
positionAtN Proxy pos [] = pos
positionAtN Proxy _ (Span start _ _:_) = trace ("posN" ++ show start) start
advance1 Proxy _ _ (Span _ end _) = end
advanceN Proxy _ pos [] = pos
advanceN Proxy _ _ ts = let Span _ end _ = last ts in end
take1_ [] = Nothing
take1_ s = case takeN_ 1 s of
Nothing -> Nothing
Just (sp, s') -> Just (head sp, s')
takeN_ _ [] = Nothing
takeN_ n s@(t:ts)
| s == [] = Nothing
| n <= 0 = Just ([t {spanEnd = spanStart t, spanBody = ""}], s)
| n < (T.length . spanBody) t = let (l, r) = T.splitAt n (spanBody t)
sL = spanStart t
eL = foldl (defaultAdvance1 (mkPos 3)) sL (T.unpack (T.tail l))
sR = defaultAdvance1 (mkPos 3) eL (T.last l)
eR = spanEnd t
l' = [Span sL eL l]
r' = (Span sR eR r):ts
in Just (trace (show n) l', r')
| n == (T.length . spanBody) t = Just ([t], ts)
| otherwise = case takeN_ (n - T.length (spanBody t)) ts of
Nothing -> Just ([t], [])
Just (t', ts') -> Just (t:t', ts')
takeWhile_ p s = fromJust $ takeN_ (go 0 s) s
where go n s = case take1_ s of
Nothing -> n
Just (c, s') -> if p c
then go (n + 1) s'
else n
Find include and swap to it:
"include" -> do
file <- between dquote dquote (many (alphaNumChar <|> char '.' <|> char '/' <|> char '_'))
s <- liftIO (Data.Text.IO.readFile file)
p <- getPosition
i <- getInput
pushPosition p
stack %= (:) (p, i)
setPosition (initialPos file)
setInput s
And if we reach the end of input pop stack and continue:
parseStream' :: StreamParser [Span]
parseStream' = concat <$> many p
where p = do
b <- tick <|> block
end <- option False (True <$ hidden eof)
h <- use stack
when (end && (h /= [])) $ do
popPosition
setInput (h ^?! ix 0 . _2)
stack %= tail
return b