1

Let's say I have logical expressions in this form

((a OR b) AND c) OR (d AND e = (f OR g))

I'm interested in getting a, b, c, d and ignore e, f, g (everything that has to do with assignments).

My attempt was something like this

infixOp : OR | AND ;

assignment : Identifier EQUALS expression ;

expression  :
    Identifier |
    assignment |
    expression infixOp expression | 
    LEFTBRACKET expression RIGHTBRACKET ;

Then in my listener in enterExpression() I just printed expressionContext.getTokens(identifierType). Of course, this is too much and I got all of them.

How could "skip" over an assignment expression? Could this be done in the grammar? Or it can only be done programatically?

Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
jack malkovick
  • 503
  • 2
  • 14
  • I think the assignment assigns `(f OR g) AND h` to `e`. So then you'd also want to ignore `h`, right? – Bart Kiers Feb 10 '23 at 10:09
  • Yes @BartKiers you're right, my mistake. I'll edit and remove h to make it simpler – jack malkovick Feb 10 '23 at 11:00
  • An alternative to all this laborious implemention, it takes just a minute to prep and then get results using [Trash](https://github.com/kaby76/Domemtech.Trash) at the command line using Bart's "T" grammar below: `echo '((a OR b) AND c) OR (d AND e = (f OR g))' | trparse | trdeltree ' //assignment' | trxgrep ' //Identifier/text()'` => a b c d. And, you have a "spec" of what you are trying to extract. – kaby76 Feb 10 '23 at 12:16

2 Answers2

1

What you can do is create a small listener and keep track of the fact when you enter- and exit an assignment. Then inside the enterExpression method, check if you're NOT inside an assignment AND the Identifier token has a value.

A quick demo for the grammar:

grammar T;

parse
 : expression EOF
 ;

expression
 : '(' expression ')'
 | expression 'AND' expression
 | expression 'OR' expression
 | assignment
 | Identifier
 ;


assignment
 : Identifier '=' expression
 ;

Identifier : [a-z]+;
Space      : [ \t\r\n] -> skip;

and Java class:

public class Main {

    public static void main(String[] args) {

        TLexer lexer = new TLexer(CharStreams.fromString("((a OR b) AND c) OR (d AND e = (f OR g))"));
        TParser parser = new TParser(new CommonTokenStream(lexer));

        MyListener listener = new MyListener();

        ParseTreeWalker.DEFAULT.walk(listener, parser.parse());

        System.out.println(listener.ids);
    }
}

class MyListener extends TBaseListener {

    public List<String> ids = new ArrayList<String>();
    private boolean inAssignment  = false;

    @Override
    public void enterExpression(TParser.ExpressionContext ctx) {
        if (!this.inAssignment && ctx.Identifier() != null) {
            this.ids.add(ctx.Identifier().getText());
        }
    }

    @Override
    public void enterAssignment(TParser.AssignmentContext ctx) {
        this.inAssignment = true;
    }

    @Override
    public void exitAssignment(TParser.AssignmentContext ctx) {
        this.inAssignment = false;
    }
}

will print:

[a, b, c, d]
Bart Kiers
  • 166,582
  • 36
  • 299
  • 288
0

Maybe something like:

    if (!(expressionContext.parent() instanceof AssignmentContext)) {
        expressionContext.getTokens(identifierType);
    }

You can fine that by walking the parse tree structure and check the different members in the expression context.

Mike Lischke
  • 48,925
  • 16
  • 119
  • 181
  • I've tried it and it does not seem to solve the problem. All parent classes seem to be `LogicExpressionsParser$ExpressionContext` – jack malkovick Feb 10 '23 at 11:01