1

I am trying to parse Java code using a library (preferably python) so that I can parse arbitrary Java files. My goal is to find out for different test classes which code they depend on (which code is being tested). This will be used to train a machine learning model. Below is an example of such code to be parsed:

import foo.bar.SimpleCalculator;
import foo.Calculator;
import foo.bar.AdvancedCalculator;

public class Test {
    Calculator calculator = new SimpleCalculator();

    public void test_simple() {
        calculator.calc(1, 2);
    }

    public void test_advanced() {
            AdvancedCalculator calculator = new AdvancedCalculator();
            calculator.calc(1, 2);
        }
}

The parser should be able to provide the fully-qualified class name of the class on which a method is called. In the example, I would want to know the fully-qualified class name of the class on which the calc() function is called in test_simple() (would be foo.Calculator) and test_advanced() (would be foo.bar.AdvancedCalculator). I'm aware, that the instance type can be changed during run-time (e.g. a SuperSimpleCalculator might have been assigned during runtime), but it would be okay for me to only have the type of the variable declaration.

I have already tried the python implementation (py-tree-sitter) of the tree-sitter library to parse given Java code but it does not fulfil the requirement specified above, because in the example tree-sitter only provides the variable name of the object (calculator) and does not provide the information whether calc() is called on a Calculator or AdvancedCalculator object.

Nick1503
  • 11
  • 2
  • What you're asking is very difficult because there is is no way, just by parsing this file, to be sure that each time `test_simple()` is called (BTW, please beware of Java naming conventions), the field `calculator` holds the same value. Had the field been `private`, it would be possible, but still not trivial. In `test_advanced()`, `calculator` is a local variable, so it's easier; you still need a non-trivial static analysis mechanism, and it only works in simple cases. – Maurice Perry Mar 15 '23 at 12:46
  • @MauricePerry `private` won't protect you from reflective changes. @OP my biggest question is "why do you need this?", in being wary of the [XY problem](http://xyproblem.info/) – Rogue Mar 15 '23 at 12:53
  • @Rogue you're right, but you could ignore it. – Maurice Perry Mar 15 '23 at 12:54
  • @Rogue same thing goes for `final` BTW – Maurice Perry Mar 15 '23 at 12:56
  • Thanks for your comments. I clarified the original question. – Nick1503 Mar 15 '23 at 15:19

1 Answers1

0

Java uses fully qualified class names in a bytecode. The easiest way for you is to take a .java file, compile it using javac. You will get a .class file with fully qualified names of all classes.

Here is an example:

untitled1\src>javac Test.java

It will produce Test.class file in the same folder. Then you can use:

untitled1\src>javap -v Test.class

.class file will always contain fully qualified class names. I don't know anything about python, but in Java we have a lot of tools to open and read .class files. One of them, and most popular is ASM project. So I would suggest you to find a .class file reader for python, and read it, not .java file.

Farrukh Nabiyev
  • 346
  • 2
  • 11