How to represent a context free grammar in code?

Question

I want to write a compiler as a personal project and am in the process of reading and understanding parsers (LL(k), LR(k), SLR etc.)

All of these parsers are based on some grammar which comes from the user and this grammar is generally written in a text file (for eg. in ANTLR, it comes in a .g4 file, which is a text file IMO). If I want my parser to create its parsing tables from such a grammar file, what is the best way to parse it and represent the productions in code?

EDIT:

For example, let's say I have this grammar:

S -> 'a'|'b'|'('S')'|T
T -> '*'S

I was thinking of parsing this given grammar and storing it as an ArrayList<ArrayList<String>>. This way every item in the ArrayList will be a collection of productions from the same non-terminal:

// with this type of a representation, I can assign an id to each production
//For example, production S -> 'a' has id 01 or T -> '*'S has an id of 10 and so on
{
{"S", "'a'", "'b'", "'('S')'", "T"},
{"T", "'*'S"}
}

I am not sure about representing the grammar as an AST, because then I don't know how to assign Ids to each production. But the above representation of a grammar seems pretty naive design to me and I am suspicious that there should be some standard way to doing this which will be easier to work with.

I'm not sure about what you want to achieve: parser are "generated" more than written. ANTLR generate source code by parsing the g4 file (which is more than a grammar, it defines both token and grammar and behaviour of the parser). What you are asking is how to represent data structures of the parser. You could look at antlr output for this (or a simpler generator, maybe) — ilmirons, Oct 17 '18 at 15:47
Write a parser for the grammar file, parse it and build an AST representing the grammar. — Ira Baxter, Oct 17 '18 at 17:18
@IraBaxter, if I create an AST of the grammar, how do I assign Ids to each production? If I represent it as an ArrayList of ArrayList, wouldn't it be better? — mettleap, Oct 17 '18 at 18:39
@mettleleap: What you do with the AST depends on how you want to generate a parser. [See this discussion of MetaII for one way to generate a parser from a parse result that skips building a tree! https://stackoverflow.com/a/17632284/120163] If you want to walk the tree and generate a list-of-rules, you can do that. Welcome to compiling, where parsing is just the first step, and code geneartion is another. — Ira Baxter, Oct 17 '18 at 20:30
ANTLR is opensource, it might be useful to have a look at how it does these things: https://github.com/antlr/antlr4/tree/master/tool. Assigning ids to AST-structured rules shouldn't be hard - just go through the AST and use a counter. — Jiri Tousek, Oct 18 '18 at 07:13
Do you want to write a compiler or a parser generator? Or both? I ask because you start by saying that you want to write a compiler, but then everything else you say makes it sound like you're trying to write a parser generator (i.e. a tool like ANTLR). Note that if you do want to write a compiler without using an existing parser generator, that does not mean you have to write your own parser generator (unless you want to of course) - it's perfectly possible to write a write parser without using a generator. — sepp2k, Oct 18 '18 at 16:49
@sepp2k, the main aim is to write a compiler, from scratch. For this purpose, at every stage of the process, I am trying to code everything myself, without taking help from tools since I want to learn — mettleap, Oct 20 '18 at 16:44

How to represent a context free grammar in code?

0 Answers0