0

Is there an AST tool that allows easily extract metadata from a Java method?

For instance, using the following code snippet

/*
 Checks if a target integer is present in the list of integers.
*/
public Boolean contains(Integer target, List<Integer> numbers) {
    for(Integer number: numbers){
        if(number.equals(target)){
            return true;
        }
    }
    return false;
}

the metadata would be:

metadata = {
    "comment": "Checks if a target integer is present in the list of integers.",
    "identifier": "contains",
    "parameters": "Integer target, List<Integer> numbers",
    "return_statement": "Boolean false"

}
Celso França
  • 653
  • 8
  • 31
  • That's funny because that is exactly what I recently wrote in **Java Parser** I'll post an answer shortly. – Y2020-09 Oct 31 '20 at 21:52

2 Answers2

2

This class was written a long time ago.. It was actually about four different classes - spread out in a package called JavaParserBridge. It tremendously simplifies what you are trying to do. I have stripped out all the unneccessary stuff, and boiled it down to 100 lines. It took about an hour...

I hope this all makes sense. I usually add a lot of comments to code, but sometimes when dealing with other libraries - and posting on Stack Overflow - since this is literally just one big constructor - I will leave you with the documentation page for Java Parser

To use this class, just pass the source-code file for a Java Class as a single java.lang.String, and the method named getMethods(String) will return a Java Vector<Method>. Each element of the returned Vector will have an instance of Method which shall have all of the Meta Information that you requested in your question.

IMPORTANT: You can get the JAR File for this package off of the github page. You need the JAR named: javaparser-core-3.16.2.jar

import com.github.javaparser.StaticJavaParser;
import com.github.javaparser.ast.CompilationUnit;
import com.github.javaparser.ast.body.TypeDeclaration;
import com.github.javaparser.ast.body.MethodDeclaration;
import com.github.javaparser.ast.body.Parameter;
import com.github.javaparser.ast.type.ReferenceType;
import com.github.javaparser.ast.type.TypeParameter;
import com.github.javaparser.ast.Node;
import com.github.javaparser.ast.NodeList;
import com.github.javaparser.ast.Modifier; // Modifiers are the key-words such as "public, private, static, etc..."
import com.github.javaparser.printer.lexicalpreservation.LexicalPreservingPrinter;
import com.github.javaparser.printer.lexicalpreservation.PhantomNodeLogic;

import java.io.IOException;
import java.util.Vector;


public class Method
{
    public final String name, signature, jdComment, body, returnType;
    public final String[] modifiers, parameterNames, parameterTypes, exceptions;

    private Method (MethodDeclaration md)
    {

        NodeList<Parameter>     paramList       = md.getParameters();
        NodeList<ReferenceType> exceptionList   = md.getThrownExceptions();
        NodeList<Modifier>      modifiersList   = md.getModifiers();

        this.name           = md.getNameAsString();
        this.signature      = md.getDeclarationAsString();
        this.jdComment      = (md.hasJavaDocComment() ? md.getJavadocComment().get().toString() : null);
        this.returnType     = md.getType().toString();
        this.modifiers      = new String[modifiersList.size()];
        this.parameterNames = new String[paramList.size()];
        this.parameterTypes = new String[paramList.size()];
        this.exceptions     = new String[exceptionList.size()];
        this.body           = (md.getBody().isPresent()
                                ?   LexicalPreservingPrinter.print
                                        (LexicalPreservingPrinter.setup(md.getBody().get()))
                                :   null);

        int i=0;
        for (Modifier modifier : modifiersList) modifiers[i++] = modifier.toString();

        i=0;
        for (Parameter p : paramList)
        {
            parameterNames[i]           = p.getName().toString();
            parameterTypes[i]           = p.getType().toString();
            i++;
        }

        i=0;
        for (ReferenceType r : exceptionList) this.exceptions[i++] = r.toString();
    }

    public static Vector<Method> getMethods(String sourceFileAsString) throws IOException
    {
        // This is the "Return Value" for this method (a Vector)
        final Vector<Method> methods = new Vector<>();

        // This asks Java Parser to parse the source code file
        // The String-parameter 'sourceFileAsString' should have this

        CompilationUnit cu = StaticJavaParser.parse(sourceFileAsString);

        // This will "walk" all of the methods that were parsed by
        // StaticJavaParser, and retrieve the method information.
        // The method information is stored in a class simply called "Method"

        cu.walk(MethodDeclaration.class, (MethodDeclaration md) -> methods.add(new Method(md)));

        // There is one important thing to do: clear the cache
        // Memory leaks shall occur if you do not.

        PhantomNodeLogic.cleanUpCache(); 

        // return the Vector<Method>
        return methods;
    }
}
Y2020-09
  • 1
  • 5
  • Impressive answer! Thank you very much. Is there a way around the need for the code to be compiled? This is necessary because I will extract this metadata in a dataset (http://leclair.tech/data/funcom/) formed by a set of standalone methods (which is possibly not compilable). – Celso França Nov 01 '20 at 15:55
  • Well, it doesn't actually need to be a `class`, the **Java Parser** type `CompilationUnit` can be any code snippet at all. The only requirement that the **JavaParser** package requires is that the code must be Syntactically Correct. – Y2020-09 Nov 01 '20 at 16:26
  • When I pass only the code snippet presented in the question I got: `Exception in thread "main" com.github.javaparser.ParseProblemException: (line 4,col 1) Parse error. Found "Boolean" , expected one of ";" "@" "class" "enum" "interface" "module" "open"`. – Celso França Nov 01 '20 at 19:08
  • I just finished the answer for you... I think it works for all of the `function.json` - ***Except the ones that are constructors*** I'm going to leave that as an exercise for you to finish. – Y2020-09 Nov 01 '20 at 19:24
1

You need to add this method to the class above... I rarely (if ever) add multiple answers to a single Stack Overflow question. But rather than making this overly complicated, since this turned into a lot of code, I'm posting this main method as a separate answer to your question.

You need to include this method in the above class, and it will properly process your file functions.json which I downloaded from your website. The file that is being processed is the one named functions.json and it is the one that contains lists of methods and their data-base ID's.

ALSO: Make sure to add the line: import java.util.regex.* because this method uses java class Pattern and class Matcher


    public static void main(String[] argv) throws IOException
    {
        // "321": "\tpublic int getPushesLowerbound() {\n\t\treturn pushesLowerbound;\n\t}\n",
        // If you have not used "Regular Expressions" before, you are just
        // going to have to read about them.  This "Regular Expression" parses your
        // JSON "functions.json" file.  It is a little complicated, but not too bad.

        Pattern         P1          = Pattern.compile("^\\s+\"(\\d+)\"\\:\\s+\"(.*?)\\\\n\",$");
        BufferedReader  br          = new BufferedReader(new FileReader(new File("functions.json")));
        String          s           = br.readLine();

        // Any time you have a "Constructor" instead of a method, you should
        // use some other method in `StaticJavaParser` to deal with it.
        // for now, I am just going to keep a "Fail List" instead..

        int             failCount   = 0;
        Vector<String>  failIDs     = new Vector<>();
 
        while (! (s = br.readLine()).equals("}"))
        {
            // Parse the JSON using a Regular Expression.  It is just easier to do it this way
            // You have a VERY BASIC json file.

            Matcher m = P1.matcher(s);
            
            // I do not think any of the String's will fail the regular expression matcher.
            // Just in case, continue if the Regular Expression Match Failed.
            if (! m.find()) { System.out.print("."); continue; }
            
            // The ID is the first JSON element matched by the regular expression
            String id = m.group(1);
            
            // The source code is the second JSON element matched by the regular-expression
            // NOTE: Your source-code is not perfect... It has "escape sequences", so these sequennces
            //       have to be "unescaped"
            // ALSO: this is not the most efficient way to "un-escape" an escape-sequence, but I would
            //       have to include an external library to do it the right way, so I'm going to leave
            //       this version here for your to think about.
            String src = m.group(2)
                .replace("\\\\", "" + ((char) 55555))
                .replace("\\n", "\n")
                .replace("\\t", "\t")
                .replace("\\\"", "\"")
                .replace("" + ((char) 55555), "\\");

            // Java Parser has a method EXPLICITLY FOR parsing method Declarations.
            // Your "functions.json" file has a list of method-declarations.
            MethodDeclaration   md          = null;

            // I found one that failed - it was a constructor..
            try
                { md = StaticJavaParser.parseMethodDeclaration(src); }
            catch (Exception e)
                { System.out.println(src); e.printStackTrace(); failCount++; continue; }

            Method method = new Method(md);

            System.out.print(
                "ID:           " + id + '\n' +
                "Name:         " + method.name + '\n' +
                "Return Type:  " + method.returnType + '\n' +
                "Parameters:   "
            );

            for (int i=0; i < method.parameterNames.length; i++)
                System.out.print(method.parameterNames[i] + '(' + method.parameterTypes[i] + ")  ");

            System.out.println("\n");

            PhantomNodeLogic.cleanUpCache();
        }
        
        System.out.print(
            "Fail Count: " + failCount + "\n" +
            "Failed ID's: "
        );
        for (String failID : failIDs) System.out.print(failID + " ");
        System.out.println();
    }

The above method will produce this type of output. Since you have - literally - one million methods, it will run for a while.

NOTE: Not every method in that list was a valid method. if there is a constructor, instead of a method, you would need to parse it as a constructor, instead. There is a "Fail List" for methods that couldn't be parsed by JavaParser - and I'm going to leave this as an excercise for you to figure out how to deal with Constructors (which aren't parsed by the StaticJavaParser method named parseMethodDeclaration

NOTE: This will run for a long time - I have only posted a (very) small subset of the output from this main(String[] argv) method...


ID:           32808641
Name:         addUnboundTypePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808649
Name:         addNamePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808650
Name:         addInputParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808651
Name:         addQualifiedNamePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808652
Name:         addOutputParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808656
Name:         addReturnParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808658
Name:         addSignatureParameterPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808659
Name:         getLabelProvider
Return Type:  IItemLabelProvider
Parameters:   namedElement(NamedElement)

ID:           32808661
Name:         getLabel
Return Type:  String
Parameters:   namedElement(NamedElement)

ID:           32808677
Name:         addBodyPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808678
Name:         addLanguagePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808696
Name:         addKindPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808707
Name:         addStaticPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808708
Name:         addKindPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808709
Name:         addSemanticsPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808711
Name:         addConstrainedElementPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808713
Name:         addDefinedFeaturePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808727
Name:         addNestingNamespacePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808741
Name:         addKindPropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32808749
Name:         addSuperTypePropertyDescriptor
Return Type:  void
Parameters:   object(Object)

ID:           32814359
Name:         getResource
Return Type:  ResourceBundle
Parameters:   name(String)  language(String)

ID:           32814360
Name:         store
Return Type:  void
Parameters:   resource(ResourceBundle)  name(String)  language(String)

ID:           32814364
Name:         getString
Return Type:  String
Parameters:   key(String)  resourceName(String)  language(String)

ID:           32814400
Name:         getGlobalCompletionRate
Return Type:  double
Parameters:

ID:           32814409
Name:         setCurrentSubTask
Return Type:  void
Parameters:   subTask(TaskMonitor)  subTaskShare(double)

ID:           32814429
Name:         enforceCompletion
Return Type:  void
Parameters:

ID:           32814431
Name:         getCurrentActiveSubTask
Return Type:  TaskMonitor
Parameters:

ID:           32814469
Name:         checkTaskState
Return Type:  void
Parameters:

ID:           32814619
Name:         getReportAsText
Return Type:  String
Parameters:   report(ProcessReport)

ID:           32815305
Name:         showRecoveryResultWindow
Return Type:  void
Parameters:   context(ProcessContext)

ID:           32815353
Name:         validateStructure
Return Type:  void
Parameters:

ID:           32815413
Name:         buildArchive
Return Type:  void
Parameters:   context(ProcessContext)

ID:           32815445
Name:         checkArchiveCompatibility
Return Type:  boolean
Parameters:   archive(File)

ID:           32815446
Name:         checkStupidConfigurations
Return Type:  boolean
Parameters:

ID:           32815472
Name:         getDescription
Return Type:  String
Parameters:

ID:           32815501
Name:         getDataDirectory
Return Type:  File
Parameters:   archive(File)

IMPORTANT: (again) Any time any of your Data-base functions are constructors rather than methods the JavaParser method that I have used in class StaticJavaParser will throw an Exception.

See Here: This is a constructor:


ID:           32812832
Name:         run
Return Type:  void
Parameters:

        public PeriodicData (String secProp ) {
                this.interval = 300;
                try {
                        this.interval = Integer.parseInt( secProp );
                } catch (Exception e ) {} // use default 5m

        }

And the code i have posted prints this message when it encounters it:


com.github.javaparser.ParseProblemException: Encountered unexpected token: "(" "("
    at line 1, column 22.

Was expecting one of:

    "enum"
    "exports"
    "module"
    "open"
    "opens"
    "provides"
    "requires"
    "strictfp"
    "to"
    "transitive"
    "uses"
    "with"
    "yield"
    <IDENTIFIER>

Problem stacktrace :
  com.github.javaparser.GeneratedJavaParser.generateParseException(GeneratedJavaParser.java:10906)
  com.github.javaparser.GeneratedJavaParser.jj_consume_token(GeneratedJavaParser.java:10752)
  com.github.javaparser.GeneratedJavaParser.Identifier(GeneratedJavaParser.java:2193)
  com.github.javaparser.GeneratedJavaParser.SimpleName(GeneratedJavaParser.java:2127)
  com.github.javaparser.GeneratedJavaParser.MethodDeclaration(GeneratedJavaParser.java:1224)
  com.github.javaparser.GeneratedJavaParser.MethodDeclarationParseStart(GeneratedJavaParser.java:6020)
  com.github.javaparser.JavaParser.parse(JavaParser.java:123)
  com.github.javaparser.JavaParser.parseMethodDeclaration(JavaParser.java:540)
  com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
  Method.main(Method.java:110)

        at com.github.javaparser.StaticJavaParser.handleResult(StaticJavaParser.java:260)
        at com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
        at Method.main(Method.java:110)
        public PeriodicData (int seconds ) {
                this.interval = seconds;
        }
com.github.javaparser.ParseProblemException: Encountered unexpected token: "(" "("
    at line 1, column 22.

Was expecting one of:

    "enum"
    "exports"
    "module"
    "open"
    "opens"
    "provides"
    "requires"
    "strictfp"
    "to"
    "transitive"
    "uses"
    "with"
    "yield"
    <IDENTIFIER>

Problem stacktrace :
  com.github.javaparser.GeneratedJavaParser.generateParseException(GeneratedJavaParser.java:10906)
  com.github.javaparser.GeneratedJavaParser.jj_consume_token(GeneratedJavaParser.java:10752)
  com.github.javaparser.GeneratedJavaParser.Identifier(GeneratedJavaParser.java:2193)
  com.github.javaparser.GeneratedJavaParser.SimpleName(GeneratedJavaParser.java:2127)
  com.github.javaparser.GeneratedJavaParser.MethodDeclaration(GeneratedJavaParser.java:1224)
  com.github.javaparser.GeneratedJavaParser.MethodDeclarationParseStart(GeneratedJavaParser.java:6020)
  com.github.javaparser.JavaParser.parse(JavaParser.java:123)
  com.github.javaparser.JavaParser.parseMethodDeclaration(JavaParser.java:540)
  com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
  Method.main(Method.java:110)

        at com.github.javaparser.StaticJavaParser.handleResult(StaticJavaParser.java:260)
        at com.github.javaparser.StaticJavaParser.parseMethodDeclaration(StaticJavaParser.java:480)
        at Method.main(Method.java:110)
Y2020-09
  • 1
  • 5