Any way to regenerate stackmap from byte code?

Question

I have an old library (circa 2005) that performs byte code manipulation, but does not touch the stackmap. Consequently my jvm (java 8) complains that they are invalid classes. Only way to circumvent the errors is to run the jvm with -noverify. But that is not a long term solution for me.

Is there someway I can regenerate the stack map after the classes have already been generated? I saw the ClassWriter class had an option to regenerate the stack map, but I'm not sure how to read in a byte class and rewrite a new one. Is that feasible?

score 6 · Accepted Answer · answered Oct 09 '17 at 10:51

When you instrument old classes not having stackmaps and keep their old version number, there will be no problem, as they will be processed by the JVM the same way as before, not requiring stackmaps. Of course, this implies that you can’t inject newer bytecode features.

When you are instrumenting newer class files which had valid stackmaps before the transformation, you will not be running into those problems described by Antimony. So you can use ASM to regenerate stackmaps:

byte[] bytecode = … // result of your instrumentation
ClassReader cr = new ClassReader(bytecode);
ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES);
cr.accept(cw, ClassReader.SKIP_FRAMES);
bytecode = cw.toByteArray(); // with recalculated stack maps

The visitor API has been designed to allow easy chaining of a reader with a writer and only add code to intercept those artifacts you want to change.

Note that since we know that we are going to regenerate the stackmap frames from scratch using ClassWriter.COMPUTE_FRAMES, we can pass ClassReader.SKIP_FRAMES to the reader to tell it not to process the source frames we’re going to ignore anyway.

There is another optimization possible when we know that the class structure doesn’t change. We can pass the ClassReader to the ClassWriter’s constructor to draw a benefit from the unchanged structure, e.g. the target constant pool will get initialized with a copy of the source constant pool. This option, however, must be handled with care. If we don’t intercept methods at all, it will get optimized too, i.e. the code gets copied entirely without even recalculating the stack frames. So we need a custom method visitor to pretend that the code could potentially change:

byte[] bytecode = … // result of your instrumentation
ClassReader cr = new ClassReader(bytecode);
// passing cr to ClassWriter to enable optimizations
ClassWriter cw = new ClassWriter(cr, ClassWriter.COMPUTE_FRAMES);
cr.accept(new ClassVisitor(Opcodes.ASM5, cw) {
    @Override
    public MethodVisitor visitMethod(int access, String name, String desc,
                                     String signature, String[] exceptions) {
        MethodVisitor writer=super.visitMethod(access, name, desc, signature, exceptions);
        return new MethodVisitor(Opcodes.ASM5, writer) {
            // not changing anything, just preventing code specific optimizations
        };
    }
}, ClassReader.SKIP_FRAMES);
bytecode = cw.toByteArray(); // with recalculated stack maps

This way, unchanged artifacts like the constant pool can be copied directly to the target byte code while the stackmap frames still get recalculated.

There are some caveats, though. Generating stackmaps from scratch implies not utilizing any knowledge about the original code structure or the nature of the transformation. E.g. a compiler would know the formal types of local variable declarations whereas the ClassWriter may see different actual types for which it has to find the common base type. This search may be very expensive, cause the loading of classes which were deferred or not even be used during normal execution. The resulting type may even differ from the common type declared in the original code. It will be a correct type, but may again change the use of classes in the resulting code.

If you are performing the instrumentation in a different environment, ASM’s attempts to load the classes for determining the common type may fail. Then, you will have to override ClassWriter.getCommonSuperClass(…) with an implementation which can perform the operation in that environment. This is also the place to add optimizations, if you have more knowledge about the code and can provide answers without expensive searches through the type hierarchy.

Generally, it’s recommended to refactor that old library to use ASM in the first place instead of needing a subsequent adaption step. As explained above, when performing the code transformation using a chain of ClassReader and ClassWriter with optimizations enabled, ASM would be able to copy all unchanged methods, including their stackmaps, and only recalculate the stackmaps of actually changed methods. In the code above, doing the recalculation in a subsequent step, we had to disable the optimization as we don’t know anymore which methods were actually changed.

The next logical step would be to incorporate stackmap handling into the instrumentation, as more than often the knowledge about he actual transformation allows to keep 99% of the existing frames and easily adapt the others, instead of needing an expensive recalculation from scratch.

This is fantastic! Due to an unfixed bug I detected in Javassist - it generates incorrect stack map frames in one situation - I can use this trick as a temporary workaround to repair my Javassist-generated byte code using ASM in order to be able to run it without `-noverify`. One thing though, when diffing the output of `javap -c -p -v`, I find that ASM has also reorganised the whole constant table, leading so lots of diffs in the method byte code too because constants have been renumbered. Can I tell ASM not to touch the constant pool? — kriegaex, Jul 12 '20 at 01:53
I have the same solution implemented in ASM (also using `COMPUTE_FRAMES`) and Javassist for comparison. Neither pure Javassist nor pure ASM reorder anything in the constant pool, other than appending new constants I need for my instrumentation. But when chaining Javassist and ASM, suddenly ASM feels inclined to reorder. I am flabbergasted. — kriegaex, Jul 12 '20 at 02:11
@kriegaex the key point has been mentioned in the answer. By passing the `ClassReader` to the `ClassWriter`’s constructor, you enable optimizations like copying the constant pool and only appending to it. I don’t know of anything that would take back the optimization after you constructed the `ClassWriter` this way. Take care that it is truly the same `ClassReader` instance you are using for `accept`. — Holger, Jul 13 '20 at 08:43
I want to apologise for my superfluous question. When I first tried both methods, it seemed that the result was the same (re-ordering of constant pool) with both methods, despite your description in the answer saying otherwise. But now I tried one more time, making sure to do full re-builds also of an uber JAR involved. Last time there must have been a stale JAR from an incomplete build. — kriegaex, Jul 13 '20 at 13:44
Now the result is as expected: When using the second method with the extra `visitMethod`, there is no re-ordering, only appending. Only the methods within the class are being partly re-ordered, but that is easy to diff in comparison. Thank you and sorry for the noise. — kriegaex, Jul 13 '20 at 13:45

Antimony · Answer 2 · 2017-10-07T16:25:04.757

As far as how to read in the class, you should be able to just use a ClassReader.

As for the more general question about the feasibility of automatically adding stack maps to old classes, in most cases, it possible. However, there are a few obscure cases where this would not be possible, mostly due to the fact that the inference verifier is laxer than the stackmap verifier. Note that these only apply to the case of adding a stack map to old code that never had one. If you are modifying existing Java 8 code, you can ignore all this.

First off of are the jsr and ret instructions, which are only allowed in classfiles version <= 49 (corresponding to Java 5). If you want to port code using them, you would have to rewrite the code to duplicate and inline all the subroutine bodies.

Apart from that, there are more minor issues. For example, the inference verifier allows you to freely mix boolean and byte arrays (they are considered the same type by the verifier), but the stackmap verifier treats them as distinct types.

Another potential issue is that with inference verification, dead code is never checked at all, while the stackmap verifier still requires you to specify stack maps for everything. In this case, the fix is easy - delete all the dead code.

Lastly, there is the issue that stackmaps require you to specify the common superclasses of types upfront when they merge in the control flow, whereas with inference verification, you don't need to explicitly specify supertypes. Most of the time, this won't matter, since you have a known inheritance hierarchy, but it is theoretically possible to inherit from classes that are only defined at runtime via a ClassLoader.

And of course, the stackmaps require corresponding entries in the constant pool, which means that you have less space in the constant pool for everything else. If you have a class that is close to hitting the maximum constant pool size, then adding a stack map may not be possible. This is very rare, but may happen with autogenerated code.

P.S. There is also the possibility of going in the other direction. If your code doesn't use any version 51.0 or 52.0 specific features (which is basically just invokedynamic, aka lambdas), then you can set the classfile version to 50.0, removing the need for a stack map. Of course, this is kind of a backwards solution, and will become increasingly hard as future classfile versions add more attractive features (such as lambdas).

Thanks for the tutorial; I had no idea that it was so involved - I was hoping I could just read in a class and rewrite it with a stackmap. That being said, the base code is compiled with J8 compiler (although it is only J6 code at best). There is then a persistence library that is doing the byte enhancement - I believe it is adding fields and getters/setters and re writing the class file. Given what you said, it would appear that I would have a bunch of work to ensure the new methods and fields meet the criteria. Is there an easy way to validate how much work would be involved? — Eric B., Oct 07 '17 at 16:39
If I generate/compile the code as J6 code, would that alleviate my issues? Is there any way to compile J7+ code as J6 target? Would I be limited from using any JEE7 or JEE8 concepts by limiting myself to J6 code? I can't imagine I would, unless the JEE apis are written for J7+? — Eric B., Oct 07 '17 at 17:01
@Eric B. You can almost certainly just use ClassReader and ClassWriter and it should just work. All the issues I mentioned are theoretical issues that you are unlikely to see in real world code. — Antimony, Oct 07 '17 at 22:44
I did some quick reading about the ClassReader and it seems that I have to manually read every field/method/etc. Is there an automated way to say "read in all methods/fields/etc" from a class? Do you know of any examples that you can point me to that would read in an entire class to be able to rewrite it all? — Eric B., Oct 08 '17 at 03:30

Any way to regenerate stackmap from byte code?

2 Answers2