0

I'm working with Boost Spirit. I've build a custom Lexer (tested and working) using Lex, and am preparing a Parser using Qi. My grammar is quite large: my lexer has approximately 120 patterns and my parser will have approximately 200 productions.

I'd like to preserve whitespace in my list of tokens but skip them in Qi. The reason is because I'd like to be able to take an input phrases, make modifications in the AST and then produce an output that closely resembles the input with whitespace preserved. As far as handling whitespace is concerned, I'm familiar with two options:

  1. Skip the whitespace in the lexer. This isn't going to work though, because I need to preserve it.
  2. Put 'tok.WS' everywhere, in all of my productions, in my parser. This would work but would be exceptionally tedious and would obscure my grammar in a way which I'd like to avoid.

Ideally I could have my parser 'ignore but accept' whitespace tokens whereever it sees them and add them to the AST (which I'll want to produce) similar to option (2) but automatically. This seems like a remote possibility. The other option would be to have the lexer ignore my whitespace tokens and then store a reference between the tokens and the productions so that I can construct my output by examining the relationship between the tokens that were provided to the parser and the productions it identified. This could work well, but I have no idea where to begin to implement this.

Considering all above, what would be the best way to preserve whitespace and remember where it occured in my input phrase so that I can use this information to construct my output phrase?

Liam M
  • 5,306
  • 4
  • 39
  • 55
  • Is your input and output the same Grammar? – BlamKiwi Dec 10 '14 at 01:23
  • @MorphingDragon yes it is. – Liam M Dec 10 '14 at 02:49
  • Have you considered only a few tokens preserving whitespace? What is this whitespace trying to achieve in terms of the text. – BlamKiwi Dec 10 '14 at 03:11
  • @MorphingDragon The point of this little project is to give the user the same thing they submitted, with modifications, back to them. Think about a refactoring tool that scans your code for a given class name, replaces it, and then gives you your code back, but preserves your formatting. – Liam M Dec 10 '14 at 03:24
  • I don't think it is reasonable preserve whitespace without also introducing whitespace as a token in your grammar. Have you thought about only having a few key productions preserve whitespace around them? Does rolling the Lexer and Parser into one into recursive descent help in any way for you? – BlamKiwi Dec 10 '14 at 04:01
  • It *might* be possible to do this if you associate each AST node with a range where the original actual text occurred. Then when you output the text you scan along the original document at the same time to output any whitespace already present. Kind of like how compilers store range information to tell you what line and column an error occurs, except you're interested in the stuff in between this information. – BlamKiwi Dec 10 '14 at 04:06

0 Answers0