Reading input in order, removing left-recursion

Question

I am in the middle of a Compilers class. Conditions so far: I have all my tokens stored in vectors, one as String literals, the other as Int to store what type (keyword, identifier, operator, etc.) of token they are.

I would like your advice on how to structure my function calls - and especially like your thoughts on how I've tried to isolate the 'expression' part of the statement, and how I can avoid left-recursion.

...I wrote the relevant parts of my code out before coming back up here to summarize: I know that I want to avoid left-recursion by reading the recursive part after the rest of the statement, but how can I do that? (i.e. reading S=E+T in order, but parsing it as "S=" then either "+T;" and "E" or "T;" and "E+")

Worse, the example given by the professor has the "output" of the program in the same order as the input, so I can't really jump around the vector to avoid processing it out of order. The example:

x=a + b;

is output as

Token: Identifier     Lexeme:   x                   
    <Statement List> -> <Statement> | <Statement> <Statement List>
    <Statement> -> <Compound> | <Assign> | <If> | <Return> | <Print> | <Scan> | <While> 
    <Assign> -> <Identifier> = <Expression>;

Token: Operator       Lexeme:   =                   

Token: Identifier     Lexeme:   a                   
    <Expression> -> <Term> <ExpressionPrime>
    <Term> -> <Factor> <TermPrime>
    <Factor> -> - <Primary> | <Primary>
    <Primary> -> <Identifier> | <Integer> | <Identifier> ( <IDs> ) | ( <Expression> ) | <Real> | true | false

Token: Operator       Lexeme:   +                   
    <Empty> -> Epsilon
    <TermPrime> -> * <Factor> <TermPrime> | / <Factor> <TermPrime> | <Empty>
    <Empty> -> Epsilon
    <ExpressionPrime> -> + <Term> <ExpressionPrime> | - <Term> <ExpressionPrime> | <Empty>   

Token: Identifier     Lexeme:   b                   
    <Term> -> <Factor> <TermPrime>
    <Factor> -> - <Primary> | <Primary>
    <Primary> -> <Identifier> | <Integer> | <Identifier> ( <IDs> ) | ( <Expression> ) | <Real> | true | false

Token: Operator       Lexeme:   ;                   
    <Empty> -> Epsilon
    <TermPrime> -> * <Factor> <TermPrime> | / <Factor> <TermPrime> | <Empty>
    <Empty> -> Epsilon
    <ExpressionPrime> -> + <Term> <ExpressionPrime> | - <Term> <ExpressionPrime> | <Empty>   
    <Empty> -> Epsilon

Currently, my code functions like this:

IN MAIN -

There are 2 vectors:
1. Words has the string literals of the token
2. Token has the token type (stored as int)
I have 2 ints:
1. currLine keeps track of the offset from the start of the file to current line
2. offset tracks the int value returned by recursive functions 
    (each of which returns an int of how many tokens were processed 
      - i.e. "z = a+b;" would return 2 (z =) + 1 (a) + 1 (+) + 1 (b) +1 (;) for a total of 6)

while (currLine < Words.size()) 
// while the offset from the start of the file is smaller than the length of the vector
{
offset = statementType( words, token, (words.begin() + currLine), (token.begin() + currLine) ); 
// offset stores the number of tokens processed in the statement
if(offset > 0) 
// error check, the code will return -1 if some function encounters an error
      {
         currLine += offset; // move the start of the next line to after the next ';'
         cout << "current line moved up to " << currLine << endl;
      }
      //length <= 0 ? invalid
      else
      {
         /* error! */
         cout << "error found!" << endl;
         break;
      }
}

IN statementType (vector String words, vector Int token, iterator wordIt, iterator tokenIt)

where the iterators are used to point to where in the vector to start scanning

//currently, it detects the first token type and checks if it follows the formula
// if token[0] is a keyword, it's assumed to be a *declaration* - it checks token[1] to see if it's "="
//    - if so, it checks to see if token[2] is an ID - if so, it checks to see if token[3] is ";"
// if all that is true, it returns 3 - if any besides the first are false, it returns -1 (for an error)
//the trick is the assignment statement
//if token[0] is an ID, it checks token[1] to see if it's "=" 

counter = evalPrimeExpr(words, token, wordit +2, tokenit +2); 
// it passes the vectors to an Expression parser, with a slight offset on the iterators for the "beginning" location
if (counter >= 0) //if counter isn't an error, it's not -1
         {
            return 2 + counter;
         }
         else
         {
            return -1;
         }

IN evalPrimeExpr-

//it counts up the tokens it reads until it reads a ";", then returns the count of tokens +1 (because the ";" is an escape/return trigger that returns before it loops again)
// while not end of vector
   while ( (wordit + counter) != words.end() )
   {
      //add 1 to length of expression
      cout << *(wordit + counter) << "\t expr token" << endl; 
      counter++;
      //check if ';' found
      if (*(wordit + counter) == ";")
      {
         // end of statement, ';' found
         cout << *(wordit + counter) << "\t expr epsilon" << endl; 
         //return 2 ( ID = ) + counter (rest of statement including ;)
         return 1+ counter;
      }
      //else continue
   }

Reading input in order, removing left-recursion

0 Answers0