8

I am trying to figure out how to take a Java pojo, and analyze its method for all other methods and function it could call. For example, here is a hardcoded example of the output. How can I make this general? I need to analyze Java objects programmatically to determine what methods they could call if executed. Example:

package com.example.analyze;

public class Main
{

    private static class Foo {

        public void foo(int value, Bar bar) {
            if(value > 5)
                bar.gaz();
        }
    }

    private static class Bar {

        public void gaz() {
            System.out.println("gaz");
        }
    }

    private static class Analyzer {

        public void analyze(Object object){
            System.out.println("Object method foo could call Bar method gaz");
        }

    }

    public static void main(String[] args)
    {
        Foo foo = new Foo();
        Analyzer analyzer = new Analyzer();
        analyzer.analyze(foo);
    }
}
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
David Williams
  • 8,388
  • 23
  • 83
  • 171
  • 2
    http://depfind.sourceforge.net/ – Jayan Oct 18 '14 at 05:26
  • Thanks, Id be happy to accept if you provide an example of how to accomplish the example. – David Williams Oct 18 '14 at 23:19
  • @Jayan, have a look at this pastebin. How do I connect the $1 to the function doSomething? In the comment section is the result of printing 3 levels of the jdpends outbound links http://pastebin.com/b9E4zEdg – David Williams Oct 19 '14 at 16:21
  • Do you want to analyze the code before execution or at runtime? If you are interested in invokations during runtime you could have a look at http://en.wikipedia.org/wiki/Aspect-oriented_programming. – user Oct 29 '14 at 08:14

7 Answers7

9

What you need is construct a call graph, and then ask if two nodes (a caller and callee) are connected in the call graph. This isn't an easy task.

What you need to do:

  • Parse the source code making up your application. Java parsers are relatively easy to find. Java 1.8 parsers, not so easy but there's one hiding in the Java compiler you can use, and another in the Eclipse JDT; my company also provides one with our DMS Toolkit.
  • Build abstract syntax trees for same; you need the code structures. The Java compiler, JDT, and DMS can all do this.
  • Perform name and type resolution. You need to know what the definition of every symbol means. The Java compiler definitely does this for one compilation unit at a time. JDT may do it for many files; I don't have a lot of experience with this. DMS can do this for very large sets of Java source files at once.
  • Now you need to do a (object) points-to analysis: you want to know, for any (object-valued) field, what specific instance objects it might point-to; that will eventually tell you what methods it might be used to trigger. You will get the information for this task by inspecting the ASTs and the symbol table definitions that tell what each symbol means. If you see X.f=new foo; you know that f in X can point to foo, as a basic fact. Generics and type erasure make this messy. If you see Y.g=Z.h, you know that g in Y can point to anything that h in Z can point to; of course Z might be class that inherits from Z. If you see Y.g=a[...], then you know that g in Y can point to any object that might have been assigned to array a. If you see Y.g=bar(...) then you know that g in Y can point to anything the bar might return; unfortunately, you now need a call graph to answer the question narrowly. You can approximate this in various ways to get a conservative answer. Now that you know how values are related to one another, you have to take a transitive closure over this set, to get some idea of what each g in each Y can point-to. You can get a more precise answer if you take into account the control and the data flow of the individual methods, but that's more machinery to construct. (Here are more details on points-to analysis.) The Java compiler computes some of this information when it is compiling, but not for an entire system of source files; remember it is processing source files one at a time. I don't think JDT attempts to do this at all. Our DMS doesn't (yet) do this, but we have done this for systems of C code of 26 million lines; this arguably a harder problem because people do all kinds of abusive things with pointers including casts that lie.
  • Finally you can construct a call graph. For each method, construct a call graph node. For each call site in a method, determine its set of callees and link the calling node to the called node. The previous step has collected the information needed to provide these links.

[You might be able to avoid the parsing/name-type resolution part of the above using Wala, which is constructed essentially by doing most of the above].

With the call graph, if you want to know if A can call B, find the node for A in the call graph, and see if there is a path to B.

Another note here suggests this is a 6 month task for a compiler class. I think it is 6 months for an experienced compiler person, or more (and we haven't addressed nasty problems such as class loaders and reflective calls).

I think you are better off finding a solution for this, that somebody else has already built. Likely somebody has; not so likely it is easily found or she wants to part with it. You might find implementations done in Univerisities; there are all kinds of papers written by academics (and supported by a prototype) to compute object-graphs. The down side is all those systems are prototypes, and being build by small, unpaid teams of graduates, they usually don't handle all the edge cases let alone the latest version of Java (lambdas, anyone?)

Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • So the solution I wrote is a lot like this. Basically parse bytecode, looking for `invoke*` calls, and add a node and directed edge to a graph structure. Then a methods dependencies are a depth first search on a node on its outbound links. The answer by Steve below uses `javassist`, I think a complete answer is both together. Right now I am working on reworking the prototype to use ASM instead of Javap, if you have any idea about this question... http://stackoverflow.com/questions/26575111/java-asm-how-to-get-opcode-name-and-tagvalue-from-asm-insnnode – David Williams Oct 27 '14 at 03:17
  • @DavidWilliams: Your graph appears to be instance-method-M calls abstract-method-x. Imagine I have class X, which has a (possibly abstract) method x, and classes X1 and class X2 both inheriting from X with methods x' and x'' overriding x. The way you are constructing your graph, it appears that you know only that method m calls *some* x, but not specifically x, x' or x''. Is that what you really want for your call graph? If you want more information, you have to know *which* of X, X1, or X2 is used at the call site; that's why I said you need "points-to" analysis. – Ira Baxter Oct 27 '14 at 03:45
4

You can use ASM api to find information on a class file, The sample code gives a fair idea on how to get the method details.

Analyzer class

package sample.code.analyze;

import java.io.IOException;

import org.objectweb.asm.ClassReader;
import org.objectweb.asm.ClassVisitor;
import org.objectweb.asm.MethodVisitor;
import org.objectweb.asm.Opcodes;

public class Analyzer {
    public void analyze(Object object) {
        ClassVisitor cv = new ClassVisitor(Opcodes.ASM4) {
            @Override
            public MethodVisitor visitMethod(int access, String name,
                    String desc, String signature, String[] exceptions) {

                System.out.println("Method: " + name + " -- " + desc);
                return new MethodVisitor(Opcodes.ASM4) {
                    @Override
                    public void visitMethodInsn(int opcode, String owner,
                            String name, String desc, boolean arg4) {
                        System.out.println("--  opcode  --  " + opcode
                                + " --  owner  --  " + owner + "name  --  "
                                + name + "desc  --  " + desc);
                        super.visitMethodInsn(opcode, owner, name, desc, arg4);
                    }
                };
            }
        };
        try {
            ClassReader classReader = new ClassReader(object.getClass().getCanonicalName());
            classReader.accept(cv, 0);
        } catch (IOException e) {
            System.err.println("Something went wrong !! " + e.getMessage());
        }
    }

    public static void main(String[] args) {
        Foo foo = new Foo();
        Analyzer analyzer = new Analyzer();
        analyzer.analyze(foo);
    }
}

Bar Class

package sample.code.analyze;

    public class Bar {
        public void gaz() {
            System.out.println("gaz");
        }
    }

Foo Class

package sample.code.analyze;

import sample.code.analyze.Bar;

public class Foo {
    public void foo(int value, Bar bar) {
        if (value > 5) {
            bar.gaz();
        }
    }
}
Hari
  • 101
  • 3
3

What your trying to do is called static code analysis - specifically data flow analysis, but with a twist...you didn't show you are looking at source code, but at compiled code...if you want to do it at runtime, where you're having to deal with compiled (bytecode) code instead of source. So, you're looking for a library capable of bytecode data-flow analysis. There are quite a few libraries out there to help (now that you know what to search for, you can find alternatives to my recommendation if you would like).

OK, not getting to an example...I like javassist - I find it to be as clear as a bytecode library can be with great examples and documentation online. javassit has some higher-level bytecode analysis API, so you might not even have to dig down too deep, depending on what you need to do.

To print output for your Foo/Bar example above, use the following code:

public static void main (String... args) throws Exception {
    Analyzer a = new Analyzer();

    ClassPool pool = ClassPool.getDefault();
    CtClass cc = pool.get("test.Foo");
    for (CtMethod cm : cc.getDeclaredMethods()) {
        Frame[] frames = a.analyze(cm);
        for (Frame f : frames) {
            System.out.println(f);
        }
    }
}

will print:

locals = [test.Foo, int, test.Bar] stack = []
locals = [test.Foo, int, test.Bar] stack = [int]
locals = [test.Foo, int, test.Bar] stack = [int, int]
null
null
locals = [test.Foo, int, test.Bar] stack = []
locals = [test.Foo, int, test.Bar] stack = [test.Bar]
null
null
locals = [test.Foo, int, test.Bar] stack = []

If you need more detail, you'll need to actually read the bytecode, and have the JVM specification handy:

public static void main (String... args) throws Exception {
        ClassPool pool = ClassPool.getDefault();
        CtClass cc = pool.get("test.Foo");
        for (CtMethod cm : cc.getDeclaredMethods()) {
            MethodInfo mi = cm.getMethodInfo();
            CodeAttribute ca = mi.getCodeAttribute();
            CodeIterator ci = ca.iterator();
            while (ci.hasNext()) {
                int index = ci.next();
                int op = ci.byteAt(index);
                switch (op) {
                    case Opcode.INVOKEVIRTUAL:
                        System.out.println("virutal");
                        //lookup in the JVM spec how to extract the actual method
                        //call info here
                        break;
                }
            }
        }
    }

I hope this helps get you started =)

Steve Siebert
  • 1,874
  • 12
  • 18
  • Does this parse class files? How would I run it over a jar? – David Williams Oct 26 '14 at 04:16
  • Thanks for the answer, I am going to try using javassist. Currently I am trying ASM to parse the bytecode. – David Williams Oct 27 '14 at 03:18
  • hmm, sorry for the delay...I didn't get an email notification. The example I wrote assumes that the class in question is already loaded on the classpath to match your example -- but it looks like you've moved past that issue already =) – Steve Siebert Oct 29 '14 at 00:45
1

This is quite tough - You will need to use Java Reflect API and do some heavy parsing and a lot of work a compiler would do. Instead you could just use one of the many Java Dependency tools/plugins already available (like JDepend from https://stackoverflow.com/a/2366872/986160)

Community
  • 1
  • 1
Michail Michailidis
  • 11,792
  • 6
  • 63
  • 106
  • I am familiar with the Reflection api. What do you think the parsing would entail? Is there no way to do it in memory from the pojo? – David Williams Oct 18 '14 at 04:14
  • 1
    It will involve parsing all the method bodies and find method invocations (using regular expressions and syntax trees). You will need to keep track of the variables and what types they are so you can log the dependencies to those class types. You will probably need to do multiple passes on all the files. You will also need to build symbol trees and syntax trees and after doing that build the graph of dependencies. But as I said this could be a six months class project in a Compilers course let's say. – Michail Michailidis Oct 18 '14 at 04:20
  • I think bytecode is even lower level - if you mean the instructions that will be run on JVM. You don't need that. – Michail Michailidis Oct 18 '14 at 04:58
1

OP Answer for reference:

The goal is to get this to work:

    MethodInvocationGraph methodInvocationGraph =
        new MethodInvocationGraph(
            Disassembler.disassembleThisJar());

    methodInvocationGraph.printObjectMethodDependencyTree(methodInvocationGraph);

Which will print the objects own dependency. To do this you need:

In depth knowledge of the ASM Tree API:

http://asm.ow2.org/

Methods of opening and accessing Jar contents, including

MethodInvocationGraph.class.getProtectionDomain().getCodeSource()

A JNI Signature parser

http://journals.ecs.soton.ac.uk/java/tutorial/native1.1/implementing/method.html

And a graph framework such as

http://jgrapht.org/

David Williams
  • 8,388
  • 23
  • 83
  • 171
-1

There is a problem in method due to made in static type . Static Method will call first Execute at the starting time of class So all will Execute at the first stage and will not able to second time call due to static qualities of method . So main method will not able to call the above method.

-1

I think you can get all information from stacktrace, if you call any method. When we get any exception we can see stack trace using printStackTrace(); method. This is not an answer but it can help you to find you a solution for your problem.

Vishwajit R. Shinde
  • 465
  • 2
  • 5
  • 18
  • OP wants to find out if one method *might* call another. A stacktrace at best offers accidental evidence that it *did*, if you get a stacktrace to occur at the right instant. Poster is right: this is *not* an answer. – Ira Baxter Oct 28 '14 at 10:36