What does CFG production rule look like in code?

Question

There is so much info out there but non of this really helps a noob like me. I read a lot of articles about context-free-languages and pushdown automation. Now im trying to understand how certain things might look in code.

Lets assume we defined a language such as:

 L = {am bn | m >= n}

Giving us the following production rules:

 S -> B   | ^
 B -> aBb | A
 A -> aA  | a

How exactly would this look like in pseudo code? I assume that all the production rules are 1 state defined as S1 or are all of them seperate states? Either way I dont know and it would be great if someone could help me understand how this works.

I know we analyze the characters of an input and depending on what input we get one of the rules apply pushing a symbol into our PDAs stack.

What, specifically, do you want your code to do? Be specific. CFGs describe languages. Do you want your code to output parse trees? Do you want your code to recognize strings in the language? Or generate them? If generate them, which ones? You don't have the time to generate them all. — Patrick87, Mar 24 '17 at 17:54
Your production rules only generate strings with m>n, the equality is impossible. As Patrick states, if you want an algorithm, you should specify for which problem exactly. — Peter Leupold, Mar 30 '17 at 08:48
@PeterLeupold ok I will update my question today. You are right, a lot of infos are missing and I will edit my example. — Asperger, Mar 30 '17 at 08:55

score 0 · Answer 1 · answered Aug 29 '17 at 15:49

There are multiple ways to convert a CFG into a piece of code that does actual parsing, each with its strengths and weaknesses.

Some algorithms, like the CYK algorithm, Unger's algorithm, and (my personal favorite) Earley's algorithm can take as input an arbitrary CFG and a string, then use dynamic programming to determine a parse tree for that string if one exists. The operation of these algorithms doesn't resemble your typical pushdown automaton, since they work by filling in tables of values while processing characters one at a time.

Some parsing algorithms, especially LR(1) and the general family of LR parsers, more directly maintain a parsing stack and use a finite-state control to drive the parser. LR(1) parsers can't handle all possible CFGs, though - they can only handle deterministic CFGs - but variations like GLR parsers can handle all grammars by essentially running multiple stacks in parallel. The compiler generation tools bison and yacc generate parsers in this family, and if you take a look at how their input files work you'll get a sense of how the CFGs are encoded in software.

LL(1) parsers and simple backtracking parsers work top-down and typically use a stack (often, the runtime call stack) to parse input strings. They can't handle all grammars, though. The ANTLR parser generator produces parsers in this family.

Packrat parsers work by using modified CFGs that encode priorities of what order to try things in. Code using these parsers tends to closely mirror the shape of the grammar. Parser combinators are another modern technique where the parsing logic looks a lot like the CFG.

I would recommend taking a compilers course or picking up a copy of "Parsing Techniques: A Practical Guide" by Grune and Jacobs if you're interested in learning more about this.

What does CFG production rule look like in code?

1 Answers1