-1

The objective of the python code(3.5) is to read standard C codes for the following rules:

Program  --> main () { declarations   statement-list } 
declarations--> data-type identifier-list; declarations |epsilon  
data-type --> int|char identifier-list --> id|id, identifier-list   
statement_list --> statement   statement_list| epsilon 
statement -->    looping-stat 
looping-stat --> while (expn) {assignment_stat} |
        for    (assignment_stat ; expn ; increment_stat )       { assignment_stat }    
expn--> factor eprime 
eprime-->relop factor|epsilon 
assignment_stat    --> id = factor 
increment_stat --> id  inc_dec 
inc_dec --> ++|-- 
factor --> id|num 
relop --> = =|!=|<=|>=|>|<

I understand that the method is to use consecutive procedure calls (For eg. for main() call declaration() and statements with it). Idea was to read lines from a text into a list and try to parse it. I am confused on rules like declaration. For example

int id;

and

while(a<100)

Any help will be appreciated.

A trial code:

#for input
def file_input():
    global buffer,lookahead

    buffer=list()
    file=open("parser.txt","r")
    lines=file.readlines()
    file.close()
    for line in lines:
        line=line.strip()
        #print(line)
        buffer.append(line)

    return(buffer)

#for the while loop
def while_loop():
    global lookahead,buffer

    if "while_loop" in buffer[lookahead]:
        print("Parsing",buffer[lookahead])
        match('while_loop')
        expression()
        assignment()

#matching   
def match(t):
    global lookahead,buffer

    print('matching', t)
    if buffer[lookahead] == "t":
        lookahead = lookahead + 1
    else:
        print('error')
Prune
  • 76,765
  • 14
  • 60
  • 81
krishnair1123
  • 71
  • 1
  • 9

1 Answers1

1

Where are you confused? You're doing fine so far.

You aren't coding individual statement types: you're using a general process to code grammar rules. Find each token in the order given on the RHS of the grammar rule. If the token is a terminal, use your match routine. If it's a non-terminal, call that non-terminal's function.

When you have a choice of RHS expansion, such as with the loop statement, you need to use the lookahead to decide which routine to call. Your function looping_stat has to look at the next token and call either while_loop or for_loop.

Got it?

Prune
  • 76,765
  • 14
  • 60
  • 81
  • ehh...The match function i used here is tries to match while(expression) with passed parameter (which is weird). Advice? – krishnair1123 Feb 28 '17 at 20:52
  • Well suppose for a statement a++; or a=a+10; i need to check a if a belongs to alphabets (isalpha can be used i suppose) how do you get each of the pieces of the string of a list element? – krishnair1123 Feb 28 '17 at 21:26
  • For an identifier, you call your **identifier** routine, which you'll have to write, just like all the others. I'm not sure what you mean by "pieces of the string of a list element". A list element is just that, one item in a list. It doesn't automatically come in string form, and I'm not sure what "pieces" you want. – Prune Feb 28 '17 at 21:41
  • Also, it appears that you're wandering to another question. I fear that you're missing several organizational pieces of your parser, such as the lexical tokenizer. If you have a separate question, post that separately. – Prune Feb 28 '17 at 21:42
  • Indeed...I have some issues with token recogonization – krishnair1123 Mar 01 '17 at 05:24
  • General hint: write a function that finds the next token, return the token text (literal string) and type (keyword, ID, operator, etc.). Your parser will call that every time it the next grammar element. Also, this tokenizer will stay one token ahead, so that the lookahead token is always available. – Prune Mar 01 '17 at 21:54