I'm hard-coding a recursive decent parser, mostly for learning purposes and, I've run into some trouble.
I'll use this short excerpt from the CSS3 grammar as an example:
simple_selector = type_selector | universal;
type_selector = [ namespace_prefix ]? element_name;
namespace_prefix = [ IDENT | '*' ]? '|';
element_name = IDENT;
universal = [ namespace_prefix ]? '*';
First, I didn't realize that namespace_prefix
was an optional part within both the type_selector
and universal
. That led to the type_selector
always failing when fed input like *|*
because it was blindly being considered for any input that matched the namespace_prefix
production.
Recursive decent is straightforward enough but my understanding of it is that I need to do a lot of (for lack of better word) exploratory recursion before settling on a production. So I changed the signature of my productions to return Boolean values. This way I could easily tell whether a specific production resulted in success or not.
I use a linked list data structure to support arbitrary look-ahead, and can easily slice this list to attempt a production and then return to my starting point if the production doesn't succeed. However, while trying out a production, I'm passing along mutable state, trying to construct a document object model. This isn't really working out because I have no way of knowing whether the production will be successful or not. And if the production isn't successful, I need to somehow undo any changes made.
My question is this. Should I use an abstract syntax tree as an intermediate representation and then go from there? Is this something you would commonly do to work around this problem? Because the issue seems to be primarily with the document object model not being a suitable tree data structure for recursion.