I'm working with Boost Spirit. I've build a custom Lexer (tested and working) using Lex, and am preparing a Parser using Qi. My grammar is quite large: my lexer has approximately 120 patterns and my parser will have approximately 200 productions.
I'd like to preserve whitespace in my list of tokens but skip them in Qi. The reason is because I'd like to be able to take an input phrases, make modifications in the AST and then produce an output that closely resembles the input with whitespace preserved. As far as handling whitespace is concerned, I'm familiar with two options:
- Skip the whitespace in the lexer. This isn't going to work though, because I need to preserve it.
- Put 'tok.WS' everywhere, in all of my productions, in my parser. This would work but would be exceptionally tedious and would obscure my grammar in a way which I'd like to avoid.
Ideally I could have my parser 'ignore but accept' whitespace tokens whereever it sees them and add them to the AST (which I'll want to produce) similar to option (2) but automatically. This seems like a remote possibility. The other option would be to have the lexer ignore my whitespace tokens and then store a reference between the tokens and the productions so that I can construct my output by examining the relationship between the tokens that were provided to the parser and the productions it identified. This could work well, but I have no idea where to begin to implement this.
Considering all above, what would be the best way to preserve whitespace and remember where it occured in my input phrase so that I can use this information to construct my output phrase?