5

How can I inspect the bytecode of a class (using something such as ASM) to learn which initial values were passed through to a method?

For example: Given some methods that pass values to each other:

void m1(Object o) {
  Object v = o;
  m2(v);
  m2("box");
}

void m2(Object o) {
  Object v = o;
  m3(x);
}  

void m3(Object o) {
}

And some method calls, all defined in the same class:

{
  Object foo = "foo";
  m1(foo);
  m2("bar");
  m3("baz");
}

How can I inspect the class' bytecode to learn that m3 will be called 4 times with the values "foo", "box", "bar" and "baz"?

Josh Stone
  • 4,328
  • 7
  • 29
  • 37
  • In general you can't do this without running the program (or simulating it). This is reducible to the halting problem. – user253751 May 29 '14 at 01:07
  • @immibis - I'm not so sure - the inputs are clearly encoded in the bytecode. They can be followed to the final `m3` invocation. In the same way that one can do this by looking at the post above, one can do it by inspecting the bytecode. It's just not clear to me the best way to go about this, programmatically. – Josh Stone May 29 '14 at 01:30
  • "In general" meaning that for any program which claims to do this, there are some input programs that either make it produce incorrect output or make it loop forever. There are still algorithms that work for some subset of all possible programs. – user253751 May 29 '14 at 07:50

1 Answers1

6

Using ASM, you can in theory trace for each method if another method of the same class is invoked from within it. The visitor API's method that is responsible for defining method invocations is visitMethodIns. Assuming that your class was called bar.Foo, you would need to trace:

visitMethodIns(<any>, "bar.Foo", <any>, <any>)

You would then need to build a transitive relation of methods calling each other where the last two parameters allow you to build such a relation hierarchy. Additionally, you would need to trace the arguments of these method invocations, what is more tricky but not impossible either.

The reason it is more complex is the number of possible ways an argument can be loaded onto the operand stack. For your example, you only need to pay attention to the visitIns and the visitLCDIns callbacks.

When calling a method on a constant pool value (LCD), the resolution of the argument is rather trivial. You would however need to trace the entire instruction chain before calling a method to learn of the local variable assignment in order to know that you are calling the method on the method parameter. Thus, you could find out that

ALOAD_0 / ASTORE_1 / ALOAD_1 => ALOAD_0

is an effective result of a sequence of reads/writes form the methods local variable array.

With all this, from parsing the byte code, you would learn about the following call-transitions:

m1(Ljava/lang/Object)V -> m2(Ljava/lang/Object)V [ALOAD 0]
                       -> m2(Ljava/lang/Object)V [LCD "box"] 
m2(Ljava/lang/Object)V -> m3(Ljava/lang/Object)V [ALOAD 0]

You could then use these results to parse your block where you find out about these method calls and their implications. You would however have created a quite fragile solution where indirections such as:

{
  Foo foo = this;
  foo.m1("bar");
}

would not be discovered. As pointed out in the comments, you basically need to emulate the Java virtual machine in order to "run" your code.

And even if you implement a complex solution to trace all this, you could still not be sure of your result. What happens when I invoke an interface method from within an implementation. Or a method of a subclass? Thanks to the dynamic dispatch of methods, you can never be sure of the target that is called.

Rafael Winterhalter
  • 42,759
  • 13
  • 108
  • 192
  • Great response. This should be enough info to get close to a decent solution since as you said, the perfect solution basically involves emulating the JVM. – Josh Stone May 31 '14 at 19:01
  • 2
    Hi @JoshStone, did you get the calling method arguments? if yes, Could you please share the information about its solution? I am also facing the same issue. I get type of the arguments of calling method however I need values that are passed to the called method. – Swati Thakare Apr 11 '16 at 13:16
  • 1
    @Josh Stone Example here obj.methodName(account.getId), in this method call I need type of account object which is Account however I get type as long (value returned by account.getId). Could you please help me out in this? – Swati Thakare Apr 11 '16 at 13:29