I want to write a compiler as a personal project and am in the process of reading and understanding parsers (LL(k), LR(k), SLR etc.)
All of these parsers are based on some grammar which comes from the user and this grammar is generally written in a text file (for eg. in ANTLR, it comes in a .g4 file, which is a text file IMO). If I want my parser to create its parsing tables from such a grammar file, what is the best way to parse it and represent the productions in code?
EDIT:
For example, let's say I have this grammar:
S -> 'a'|'b'|'('S')'|T
T -> '*'S
I was thinking of parsing this given grammar and storing it as an ArrayList<ArrayList<String>>
. This way every item in the ArrayList will be a collection of productions from the same non-terminal:
// with this type of a representation, I can assign an id to each production
//For example, production S -> 'a' has id 01 or T -> '*'S has an id of 10 and so on
{
{"S", "'a'", "'b'", "'('S')'", "T"},
{"T", "'*'S"}
}
I am not sure about representing the grammar as an AST, because then I don't know how to assign Ids to each production. But the above representation of a grammar seems pretty naive design to me and I am suspicious that there should be some standard way to doing this which will be easier to work with.