5

I'd like to create a reverse debugger (a debugger where it is possible to go backwards in program execution) for Java and for this I need to store variable data alongside program execution. I will use a global cache for this and a static method which updates the cache.

I'd like to instrument loaded classes in such a way that after each field/variable modification, my static method will be called: e.g:

public static void updateCache(String fullVarName, Object value){...}

What I observed is that when a field is updated, a putfield instruction is executed. When a local variable is updated, an (I)STORE instruction is used. So I thought of instrumenting the classes and whenever such an opcode is found, I simply insert another getfield or ILOAD after, to get the value of the updated field/variable and then I use an invokestatic to call my static method with all the necessary information.

The problem is that there are other use cases where variables are collections or arrays and they are updated with specific methods like when updating a HashMap with map.put(key, value). So I need to intercept these calls as well, but there are a high number of such methods and I need to find and hardcode them all...
Is there a workaround? Or maybe I am missing something and there is a simpler solution.

Edit: I've also looked into JVMTI before and ran some benchmarks. It seemed that it is too slow for my use case e.g. adds an 7-100x slowdown to my program.....

Nfff3
  • 321
  • 8
  • 24
  • 3
    You forgot about array store operations. Besides that, a `HashMap` is just an ordinary Java class. All the `put` method does, boils down to field and array store operations. There is no need to treat it differently. However, recording every change of the heap state is likely to create an exploding amount of data. – Holger May 15 '20 at 07:14
  • @Holger yes, you are right. I didn't necessarily want to instrument library classes such as `HashMap` for several reasons such as increased instrumentation time and even if I instrument them I kind of have to know how they are represented under the hood (what kind of array etc.) so that I can store them in my cache and retrieve them in a readable manner. Also what do you think, what would be a more elegant way to do what I want? – Nfff3 May 15 '20 at 11:01
  • 6
    I don’t think that this is possible in a reasonable way, except for very small code paths. You’d rather need a special JVM (or at least, bytecode interpreter) with a versioning heap instead of instrumented code, to this efficiently. – Holger May 15 '20 at 11:59
  • @Holger, yeah, I understand.....What do you think about a Serviceability-Agent which gets information about the JVM? Or about using the dynamic attach API to attach a thread to the running JVM which has access to all data structures in the JVM itself and then from time to time extract the information from those data structures? – Nfff3 May 15 '20 at 12:37
  • 3
    That boils down to the expenses of a heap dump, unless you have special JVM support to get noticed which parts of the heap truly changed. Trying to get this via Instrumentation would again lead to the necessity of instrumenting all classes, which you wanted to avoid. – Holger May 15 '20 at 15:47
  • @Holger, understood, thank you very much! – Nfff3 May 15 '20 at 15:57
  • You could use Javassist to instrument every method of every class to send back feeds to your central engine – rakwaht May 19 '20 at 08:03

1 Answers1

2

If your goal is to get only the possibility for reverse debugging, you can try Jive (https://cse.buffalo.edu/jive/). It can be used together with Eclipse.

But if your goal is to create a reverse debugging tool by yourself this article may help you: https://www.researchgate.net/publication/220093333_Back_to_the_Future_Omniscient_Debugging

UweJ
  • 447
  • 3
  • 10