this the continuation of me trying to make a recursive descent parser--LL(1)-- that takes in infix expressions and outputs RPN. Here is a link to my first question to which @rici did an amazing job of answering and i hope i do his answer justice with this revised implementation. My new grammer is as follows(without support for unary operators):
expr -> term (+|-) term | term
term -> exponent (*|/) exponent | exponent
exponent -> factor ^ factor | factor
factor -> number | ( expr )
in his answer @rici points out with respect to Norwell's grammar:
We normally put the unary negation operator between multiplication and exponentiation
and i have tried to in-cooperate it here:
expr -> term (+|-) term | term
term -> exponent1 (*|/) exponent1 | exponent1
exponent1 -> (+|-) exponent | exponent
exponent -> factor ^ factor | factor
factor -> number | ( expr )
Coding the first grammar made it such that uary(+/-) numbers cannot be accepted and only binary -/+ operators were the one to be accepted. And the solution works well for the number of problems that i have tried (it could be wrong and hope to learn more). However on closer inspection the second one fails and I am forced reside back to the same "hack" i used in my first. As @rici points out:
By the way, your output is not Reverse Polish Notation (and nor is it unambiguous without parentheses) because you output unary operators before their operands.
to be fair he does point out adding the extra 0 operand which is fine and i think it is going to work. However say if i do 13/-5
this whose equivalent infix would be 13/0-5
and its RPN 13 0 / 5 -
. Or perhaps i am misunderstanding his point.
And finally to put the nail in the coffin @rici also points out:
left-recursion elimination would have deleted the distinction between left-associative and right-associative operators
and hence that would mean that it is pretty much impossible to determine the associativity of any of the operators, whereby all are the same and none are different. Moreover that would imply that trying to support many right and left associative operators is going to be very difficult if not impossible for simple LL(1) parsers.
Here is my C code implementation of the grammar:
#include <stdio.h>
#include <stdlib.h>
void error();
void factor();
void expr();
void term();
void exponent1();
void exponent();
void parseNumber();
void match(int t);
char lookahead;
int position=0;
int main() {
lookahead = getchar();
expr();
return 0;
}
void error() {
printf("\nSyntax error at lookahead %c pos: %d\n",lookahead,position);
exit(1);
}
void factor() {
if (isdigit(lookahead)) {
parseNumber();
// printf("lookahead at %c",lookahead);
} else if(lookahead =='('){
match('(');
expr();
match(')');
}else {
error();
}
}
void expr(){
term();
while(1){
if(!lookahead||lookahead =='\n') break;
if(lookahead=='+'|| lookahead=='-'){
char token = lookahead;
match(lookahead);
term();
printf(" %c ", token);
}else {
break;
}
}
}
void term(){
exponent1();
while(1){
if(!lookahead||lookahead =='\n') break;
if(lookahead=='/'|| lookahead=='*'){
char token = lookahead;
match(lookahead);
exponent1();
printf(" %c ", token);
}else {
break;
}
}
}
void exponent1(){
if(lookahead=='-'||lookahead=='+'){
char token = lookahead;
match(lookahead);
//having the printf here:
printf("%c", token);
//passes this:
// 2+6*2--5/3 := 2.00 6.00 2.00 * + 5.00 3.00 / -
// -1+((-2-1)+3)*-2 := -1.00 -2.00 1.00 - 3.00 + -2.00 * + (not actual RPN @rici mentions)
//but fails at:
// -(3/2) := -3.00 2.00 /
// -3/2 := -3.00 2.00 /
exponent();
// but having the printf here
//printf("%c ", token);
// fails this -1+((-2-1)+3)*-2 := 1.00 - 2.00 - 1.00 - 3.00 + 2.00 - * +
// since it is supposed to be
// 1.00 - -2.00 1.00 - 3.00 + -2.00 * +
// but satisfies this:
// -(3/2) := 3.00 2.00 / -
// (-3/2) := 3.00 - 2.00 /
}else {
exponent();
//error();
}
}
void exponent(){
factor();
while(1){
if(!lookahead||lookahead =='\n') break;
if(lookahead=='^'){
char token = lookahead;
match('^');
factor();
printf(" ^ ");
}else {
break;
}
}
}
void parseNumber() {
double number = 0;
if (lookahead == '\0'|| lookahead=='\n') return;
while (lookahead >= '0' && lookahead <= '9') {
number = number * 10 + lookahead - '0';
match(lookahead);
}
if (lookahead == '.') {
match(lookahead);
double weight = 1;
while (lookahead >= '0' && lookahead <= '9') {
weight /= 10;
number = number + (lookahead - '0') * weight;
match(lookahead);
}
}
printf("%.2f ", number);
//printf("\ncurrent look ahead at after exiting parseNumber %c\n",lookahead);
}
void match(int t) {
if (lookahead == t){
lookahead = getchar();
position++;
}
else error();
}
So does that mean I should give up on LL(1) parsers and perhaps look at LR parsers instead? Or can increasing the lookahead number help and if there are many paths then it could perhaps narrow things down decreasing the lookhead of the lookahead. For instance:
-(5
;; looks weird
-(
5 ;; could be - ( exp )
or
--5
;; could be many things
--
5 ;; ought to be the -- operator and output say #
EDITs:
I think having a larger lookahead is going to be difficult to coordinate. So perhaps have something like the shunting yard algorithm where i like peek into the next operator and based on the precedence of the operator the alogrthim is going to determine function call to do. Something like using actual stack of the actual running program. So a pop would be a return and a push would be a function call. Not sure how i could coordinate that with recursive descent.
perhaps the precedence of the peek should determine the lookahead length?