I have been reading Java bytecode from a variety of files to help with my understanding of the .class files for a project where I need to integrate with a 3rd party library which has no source code and poor documentation available.
For my own amusement I ran the Apache BCEL library through my maven repository to see where the rarer class and method attributes such as type annotations are used and why.
I stumbled across a problem with a specific jar which would not decode one of the constant fields - CONSTANT_Utf8_info specifically. The library is icu4j-2.6.1.jar (com.ibm.icu:icu4j)
, specifically the LocaleElements_zh__PINYIN.class
file. Apache BCEL fails (and my own attemps at a quick bytecode reader complying with the JVMS version 8 and 9) stumbles into the same problem where they misread this constant and then reads the next byte which evaluates as an incorrect constant tag (0x3C/60).
Doing a quick check to see if I can use the class in an IDE fails (cannot resolve symbol). Investigating the actual bytecode using a Hex Editor, shows that the constant at that offset (0x1AC
) is a Utf8 constant (tag=0x01
) with a length of 0x480E
. Moving forward that amount in the file indeed has a byte 0x3C
at that location. Visually looking at the file, I can see that the constant in question ends at location 0x149BD
which makes the actual length of the string 0x1480E
(which is essentially the first three bytes at location 0x1AC
). This is of course not possible as per the JVM classfile specification which has a maximum length of 0xFFFF
or 65535 for a Utf8 constant. The classfile is quite old - version 46 or Java 1.2.
I've pored over the specification and tried different possible implementations (both less and more strict) to try and parse this constant but it either cannot parse it, or it breaks the reading of other valid Utf8 constants.
My question then is, have I missed something, or is it a compiler mistake in which case my second question is how could this have happened in the first place - compilers tend to be relatively thoroughly checked. Lastly, how does the Java compiler normally manage string literals that are longer than 65535 bytes in length?