I am trying to parse Java code using a library (preferably python) so that I can parse arbitrary Java files. My goal is to find out for different test classes which code they depend on (which code is being tested). This will be used to train a machine learning model. Below is an example of such code to be parsed:
import foo.bar.SimpleCalculator;
import foo.Calculator;
import foo.bar.AdvancedCalculator;
public class Test {
Calculator calculator = new SimpleCalculator();
public void test_simple() {
calculator.calc(1, 2);
}
public void test_advanced() {
AdvancedCalculator calculator = new AdvancedCalculator();
calculator.calc(1, 2);
}
}
The parser should be able to provide the fully-qualified class name of the class on which a method is called. In the example, I would want to know the fully-qualified class name of the class on which the calc() function is called in test_simple() (would be foo.Calculator) and test_advanced() (would be foo.bar.AdvancedCalculator). I'm aware, that the instance type can be changed during run-time (e.g. a SuperSimpleCalculator might have been assigned during runtime), but it would be okay for me to only have the type of the variable declaration.
I have already tried the python implementation (py-tree-sitter) of the tree-sitter library to parse given Java code but it does not fulfil the requirement specified above, because in the example tree-sitter only provides the variable name of the object (calculator) and does not provide the information whether calc() is called on a Calculator or AdvancedCalculator object.