I am trying to parse lambda calculus terms into AST leveraging JavaScript and PEG.JS. The grammar is fairly easy:
/*****************************************************************
t ::=
x variable
λx.t abstraction
t t application
*****************************************************************/
From which I have coded out the PEG:
TERM "term"
= ABSTRACTION
/ APPLICATION
/ VARIABLE
APPLICATION "application"
/*****************************************************************
application ::= t t
*****************************************************************/
= APPLICATION_W_PARENS
/ APPLICATION_WO_PARENS
ABSTRACTION "abstraction"
/*****************************************************************
abstraction ::= λx.t
*****************************************************************/
= ABSTRACTION_W_PARENS
/ ABSTRACTION_WO_PARENS
VARIABLE "variable"
/*****************************************************************
variable ::= x
*****************************************************************/
= x:CHARACTER
{
return Variable(location(), x)
}
//////////////////////////////////////////////////////////////////////
// Application
//////////////////////////////////////////////////////////////////////
ABSTRACTION_OR_VARIABLE
//////////////////////////////////////////////////////////////////
// "Left recursive grammar" workaround "term term" enters a loop
// assuming the left side cannot match Application
// remediates the left recursion issue
//////////////////////////////////////////////////////////////////
= ABSTRACTION / VARIABLE
APPLICATION_W_PARENS
/*****************************************************************
'(' -> Abstraction | Variable -> Term -> ')'
*****************************************************************/
= L_PARENS lhs:ABSTRACTION_OR_VARIABLE rhs:TERM R_PARENS
{
return Application(location(), lhs, rhs, true)
}
APPLICATION_WO_PARENS
/*****************************************************************
Abstraction | Variable -> Term
*****************************************************************/
= lhs:ABSTRACTION_OR_VARIABLE rhs:TERM
{
return Application(location(), lhs, rhs, false)
}
//////////////////////////////////////////////////////////////////////
// Abstraction
//////////////////////////////////////////////////////////////////////
ABSTRACTION_W_PARENS "abstraction"
/*****************************************************************
'(' -> 'λ' -> Variable -> '.' -> TERM -> ')'
*****************************************************************/
= L_PARENS LAMBDA x:CHARACTER DOT term:TERM R_PARENS
{
return Abstraction(location(), x, term, true)
}
ABSTRACTION_WO_PARENS
/*****************************************************************
'λ' -> Variable -> '.' -> Term
*****************************************************************/
= LAMBDA x:CHARACTER DOT term:TERM
{
return Abstraction(location(), x, term, false)
}
//////////////////////////////////////////////////////////////////////
// Atoms
//////////////////////////////////////////////////////////////////////
LAMBDA "lambda"
= 'λ'
L_PARENS "lParens"
= '('
R_PARENS "rParens"
= ')'
DOT "dot"
= [\.]
CHARACTER "character"
= [A-Za-z]
{
return text().trim() ;
}
This compiles and runs fine on simple input. As I start to push through the examples to test the implementation I see some issues. Given the term
λl.λm.λn.lmn
It parses into
{
"expr": "λl.λm.λn.lmn",
"ast": " Abstraction( l, Abstraction( m, Abstraction( n, Application( Variable( l ), Application( Variable( m ), Variable( n ) ) ) ) ) )"
}
The problem is in Left Application m should be applied to l and then n to that result. As you can see by the printout of the AST that n is applied to m and that result is applied to l which is not correct.
IF I change the rule that is in place to prevent left recursion issues where the application assumes that the left side is only a variable or an abstraction to include the possibility of application - then there is the recursion issue.
I introduced the concept of parens - but I stopped integrating them in. I really don't want them in the grammar.
- Can we fix this in the PEG.JS?
- OR Should I rewrite the construction of the Application Object (hack)?
- OR Is there a better way to parse this - e.g. roll a custom parser?