Megaparsec: macro expansion during parsing

Question

In a small DSL, I'm parsing macro definitions, similarly to #define C pre-processor directives (here a simplistic example):

_def mymacro(a,b) = a + b / a

When the following call is encountered by the parser

c = mymacro(pow(10,2),3)

it is expanded to

c = pow(10,2) + 3 / pow(10,2)

My current approach is:

wrap the parser in a State monad
when parsing macro definitions, store them in the state, with their body unparsed (parse it as a string)
when parsing a macro call, find the definition in the state, replace the arguments in the body text, replace the call with this body and resume the parsing.

Some code from the last step:

macrocallStmt
  = do  -- capture starting position and content of old input before macro call
        oldInput <- getInput
        oldPos <- getPosition
        -- parse the call
        ret <- identifier
        symbolCS "="
        i <- identifier
        args <- parens $ commaSep anyExprStr
        -- expand the macro call
        us <- get
        let inlinedCall = replaceMacroArgs i args ret us
        -- set up new input with macro call expanded
        remainder <- getInput
        let newInput = T.append inlinedCall (T.cons '\n' remainder)
        setPosition oldPos
        setInput newInput
        -- update the expanded input script
        modify (updateExpandedInput oldInput newInput)

anyExprStr = fmap praShow expression <|> fmap praShow algexpr

This approach does the job decently. However, it has a number of drawbacks.

Parsing multiple times

Any valid DSL expression can be an argument of the macro call. Therefore, even though I only need their textual representation (to be replaced in the macro body), I need to parse them and then convert them again to string - simply looking for the next comma wouldn't work. Then the complete and customised macro will be parsed. So in practice, macro arguments get parsed twice (and also show-ed, which has its cost). Moreover, each call requires a new parsing of the (almost same) body. The reason to keep the body unparsed in memory is to allow maximum flexibility: in the body, even DSL keywords could be constructed out of the macro arguments.

Error handling

Because the expanded body is inserted in front of the unconsumed input (replacing the call), the initial and final input can be quite different. In the event of a parse error, the position where the error occurred in the expanded input is available. However, when processing the error, I only have the original, not expanded, input. So the error position won't match. That is why, in the code snippet above, I use the state to save the expanded input, so that it is available when the parser exits with an error. This works well, but I noticed that it becomes quite costly, with new Text arrays (the input stream is Text) being allocated for the whole stream at every expansion. Perhaps keeping the expanded input in the state as String, rather than Text, would be cheaper in this case, i.e. when a middle part needs to be replaced?

The reasons for this question are:

I would appreciate suggestions / comments on the two issues described above
Can anyone suggest a better approach altogether?

You could parse macro definitions as expression trees which preserve whitespace, then parse macro arguments as expressions as well, and then substitute expressions into expressions (a well explored problem). Things like building keywords out of macro arguments (i.e. the *semantics* of macros, as opposed to their syntax) could be implemented as a transformation on expression trees which is applied right after substitution. (This question could do with some more detail, like a minimal example of a parser described in the question) — user2407038, Sep 10 '16 at 14:34

Megaparsec: macro expansion during parsing

Parsing multiple times

Error handling

0 Answers0