I have implemented recursive descent and PEG-like parsers in the past, where you could do things like this:
Path -> Segment+
Segment -> Slash Name
Segment -> /
Name -> /\w+/
Slash -> /
- where
Segment+
means "match one or moreSegment
" - and there's a plain old regular expression for matching one or more word characters with
\w+
How do you typically accomplish this same sort of thing with LR grammars/parsers? All of the examples of LR parsers I have seen are very basic, such as parsing 1 + 2 * 3
, or (())()
, where the patterns are very simple and don't seem to involve "one or more" functionality (or zero or more with *
, or optional with ?
). How do you do that in an LR parser generally?
Or does LR parsing require a lexing phase first (i.e. an LR parser requires terminal and nonterminal "tokens"). Hoping that there is a way to do LR parsing without two phases like that. The definition of an LR parser talks about "input characters" in the books/sites I've been reading, but then you see casually/subtly a line like:
The grammar's terminal symbols are the multi-character symbols or 'tokens' found in the input stream by a lexical scanner.
And it's like what, where did the scanner come from.