14

So I have a classloader (MyClassLoader) that maintains a set of "special" classes in memory. These special classes are dynamically compiled and stored in a byte array inside MyClassLoader. When the MyClassLoader is asked for a class, it first checks if its specialClasses dictionary contains it, before delegating to the System classloader. It looks something like this:

class MyClassLoader extends ClassLoader {
    Map<String, byte[]> specialClasses;

    public MyClassLoader(Map<String, byte[]> sb) {
        this.specialClasses = sb;
    }

    @Override
    public Class<?> loadClass(String name) throws ClassNotFoundException {
        if (specialClasses.containsKey(name)) return findClass(name);
        else return super.loadClass(name);
    }

    @Override
    public Class findClass(String name) {
        byte[] b = specialClasses.get(name);
        return defineClass(name, b, 0, b.length);
    }    
}

If I want to perform transformations (e.g. instrumentation) on the specialClasses, I can do so simply by modifying the byte[] before i call defineClass() on it.

I would also like to transform the classes which are provided by the System classloader, but the System classloader doesn't seem to provide any way of accessing the raw byte[] of the classes it provides, and gives me the Class objects directly.

I could use a -javaagent instrument all classes loaded into the JVM, but that would add overhead to the classes which I do not want to instrument; I only really want the classes loaded by MyClassLoader to be instrumented.

  • Is there any way of retrieving the raw byte[] of the classes provided by the parent classloader, so I can instrument them before defining my own copy?
  • Alternately, is there any way of emulating the functionality of the System classloader, in terms of where it grabs it's byte[]s from, so that MyClassLoader can instrument and define its own copy of all the System classes (Object, String, etc.)?

EDIT:

So I tried another approach:

  • Using a -javaagent, Capture the byte[] of every class that is loaded and store it in a hashtable, keyed by the name of the class.
  • MyClassLoader, instead of delegating the system classes to its parent classloader, would instead load their bytecode from this hashtable using the class name and define it

In theory this would let MyClassLoader define its own version of the system classes with instrumentation. However, it fails with a

java.lang.SecurityException: Prohibited package name: java.lang

Clearly the JVM does not like me defining the java.lang classes myself, even though it should (in theory) be from the same byte[] source that the bootstrap-loaded classes should come from. The search for a solution continues.

EDIT2:

I found a (really sketchy) solution for this problem, but if someone who knows more than me about the intricacies of the Java classloading/instrumentation can come up with something less sketchy, that would be awesome.

Li Haoyi
  • 15,330
  • 17
  • 80
  • 137
  • Byte code rewriting is one way to do aspect oriented programming (e.g. AspectJ). I'd wonder if AOP and AspectJ could accomplish what you want to do with a more proven, less complex technology. Better to not write it yourself. – duffymo Oct 23 '12 at 14:38
  • I'm trying to explore the ideas in http://hulaas.com/jraf2/publications/HOSC08.pdf, and want to instrument every heap allocation site and every "block" of bytecode to account for both memory allocation and the # of bytecodes executed. AFAIK AspectJ and most AOP frameworks are too coarse-grained to do this. – Li Haoyi Oct 23 '12 at 14:57
  • I haven't read the citation yet, but I'd wonder if this would have a Heisenberg issue: introducing code would distort the numbers you report. Having this knowledge, how do you act on it? Rewrite the garbage collector, too? I'd be curious to hear the justification for the complexity. – duffymo Oct 23 '12 at 16:53
  • Assuming the numbers are distorted in a reasonably consistent way, it would already give a much better estimate of the amount of "work" (in bytecodes) it requires to do something, and I believe would be a better/more consistent estimate than anything else I could trying to fudge some measurement using CPU time. – Li Haoyi Oct 23 '12 at 17:13
  • The fact is that profilers of all kinds have been doing this sort of bytecode munging forever, and nobody questions whether Yourkit or JProfiler give results which are "actionable". The only difference is these programs attach as `-javaagent`s and perform whole-program transformations, and I would like to try and isolate the transformation within a single classloader to cut down on unnecessary overhead, by not instrumenting the sections of the program I am *not* interested in. However, the sections which i *am* interested in do use system classes, so I would like to instrument *their* copy. – Li Haoyi Oct 23 '12 at 17:17
  • Fair enough Li, I asked so I could read your argument. Thanks for posting it. – duffymo Oct 23 '12 at 17:28

4 Answers4

11

So I have found a solution to this. It's not a very elegant solution, and it would cause a lot of angry emails at code-review time, but it seems to work. The basic points are:

JavaAgent

Use a java.lang.instrumentation and a -javaagent to store the Instrumentation object to use later

class JavaAgent {
    private JavaAgent() {}

    public static void premain(String agentArgs, Instrumentation inst) {
        System.out.println("Agent Premain Start");
        Transformer.instrumentation = inst;
        inst.addTransformer(new Transformer(), inst.isRetransformClassesSupported());
    }    
}

ClassFileTransformer

Add a Transformer to the Instrumentation that only works on the marked classes. Something like

public class Transformer implements ClassFileTransformer {
    public static Set<Class<?>> transformMe = new Set<>()
    public static Instrumentation instrumentation = null; // set during premain()
    @Override
    public byte[] transform(ClassLoader loader,
                            String className,
                            Class<?> classBeingRedefined,
                            ProtectionDomain protectionDomain,
                            byte[] origBytes) {


        if (transformMe.contains(classBeingRedefined)) {
            return instrument(origBytes, loader);
        } else {
            return null;
        }
    }
    public byte[] instrument(byte[] origBytes) {
        // magic happens here
    }
}

ClassLoader

In the classloader, explicitly mark each loaded class (even the classes whose loading is delegated to the parent) by placing it in transformMe before asking the Instrumentation to transform it

public class MyClassLoader extends ClassLoader{
    public Class<?> instrument(Class<?> in){
        try{
            Transformer.transformMe.add(in);
            Transformer.instrumentation.retransformClasses(in);
            Transformer.transformMe.remove(in);
            return in;
        }catch(Exception e){ return null; }
    }
    @Override
    public Class<?> loadClass(String name) throws ClassNotFoundException {
        return instrument(super.loadClass(name));
    }
}

... and voila! Every class which is loaded by MyClassLoader gets transformed by the instrument() method, including all the system classes like java.lang.Object and friends, while all the classes which are loaded by the default ClassLoader are left untouched.

I have tried this using a memory-profiling instrument() method, which inserts callback hooks to track memory allocations in the instrumented bytecode, and can confirm that the MyClassLoad classes are firing the callbacks when their methods run (even system classes) while the "normal" classes are not.

Victory!

This is, of course, terrible code. Shared mutable state everywhere, non-local side-effects, globals, everything that you can possibly imagine. Probably isn't threadsafe either. But it shows that such a thing is possible, you can indeed selectively instrument the bytecode of classes, even system classes, as part of a custom ClassLoader's operation, while leaving the "rest" of the program untouched.

Open Problems

If someone else has any ideas how to make this code less terrible, I would be glad to hear it. I could not figure out a way of:

  • Making the Instrumentation only instrument classes on demand via retransformClasses() and not instrument classes loaded otherwise
  • Store some metadata in each Class<?> object which would allow the Transformer to tell whether it should be transformed or not, without the global-mutable-hashtable lookup.
  • Transform a system class without using the Instrumentation.retransformClass() method. As mentioned, any attempts to dynamically defineClass a byte[] into a java.lang.* class fails due to hardcoded checks in ClassLoader.java.

If anyone can find a way around any of these problems, it would make this much less sketchy. Regardless, I'm guessing being able to instrument (e.g. for profiling) some sub-system (i.e. the one you're interested in) while leaving the rest of the JVM untouched (with no instrumentation overhead) will be useful to someone else besides me, so here it is.

Li Haoyi
  • 15,330
  • 17
  • 80
  • 137
6

First an explanation without ClassFileTransformer:

The license of the Oracle JRE/JDK includes that you cannot change java.* packages, and from what you've shown with your test of trying to change something in java.lang, they've included a test and throw a security exception if you try.

With that said, you can change the behavior of system classes by compiling an alternative and referencing it using the JRE -Xbootclasspath/p CLI option.

After looking at what you can achieve through that method, I expect you will have to work more and compile a custom version of the OpenJDK. I expect this because the Bootstrap classloader is (from what I've read) a native implementation.

See http://onjava.com/pub/a/onjava/2005/01/26/classloading.html for my favorite overview of classloaders.

Now with ClassFileTransformer:

As you've shown, you can update the method programs (and some specific other aspects of pre-loaded classes). In response to the questions you've asked:

Instrumenting on demand: what's important here is that every loaded class has a unique Class instance associated with it; so if you want to target a specific loaded class you'll have to note what instance that is, and this can be found through various ways including the member, 'class', associated with every class name, like Object.class.

Is it threadsafe: no, two threads could be changing the set concurrently, and you can solve this problem in many ways; I suggest using a concurrent version of Set.

Globals etc: I think globals specifically are necessary (I think your implementation could be done a little better), but chances are there won't be problems, and you'll learn how to code better for Java later (I've coded for about 12 years now, and you wouldn't believe some subtle things about using the language).

Metadata in Class instances: of all the time I've used Java, attaching metadata has not been natural, and probably for good reason; keeping a map for the specific purpose is fine, and remember it's only a map between the pointer to the instance, and the metadata, so it's not really a memory hog.

Tom
  • 830
  • 1
  • 5
  • 13
2

A class loader does not offer a public API to access the byte code of already loaded classes. The byte code is most likely cached somewhere by the VM (in the Oracle VM, this is done in native code), but you can't really get it out as a byte array anymore.

What you can do however, is to reread the class file as a resource. Unless I forget something obvious, ClassLoader#getResource() or Class#getResource() should use the same search path to load class files, as it uses to load resources:

public byte[] getClassFile(Class<?> clazz) throws IOException {     
    InputStream is =
        clazz.getResourceAsStream(
            "/" + clazz.getName().replace('.', '/') + ".class");
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    int r = 0;
    byte[] buffer = new byte[8192];
    while((r=is.read(buffer))>=0) {
        baos.write(buffer, 0, r);
    }   
    return baos.toByteArray();
}
jarnbjo
  • 33,923
  • 7
  • 70
  • 94
  • I have not tried to see whether class.getResourceAsStream gives me the `byte[]` I want, but regardless, defining of System classes (e.g. anything `java.lang.*`) seems to be blocked by the ClassLoader =(. So even if I could get the bytes and transform them, I would not be able to `defineClass` the bytes into the class that I need. – Li Haoyi Oct 25 '12 at 01:40
0

You want to redefine Object.class, but this class is already loaded before any program runs, including your classloader. Even if you create your own Object.class, it would conflict with the already loaded system class and this would make a mess.

The only way I see is to instrument the system classes offline, that is, take rt.jar, instrument all classes in it and write back.

Alexei Kaigorodov
  • 13,189
  • 1
  • 21
  • 38
  • Can't multiple copies of the same class exist at the same time, one per classloader? I know I have no problem instrumenting even Object using `-javaagent`s to rewrite it after it's already been loaded. http://code.google.com/p/java-allocation-instrumenter/ does this, for example, and I have tested and verified that it is indeed instrumenting the methods of Object (calling `toString()` on `new Object()`s causes the recording method to be called). Ideally, however, I would only instrument the Object class used by MyClassLoader, in order to cut down unnecessary overhead. – Li Haoyi Oct 23 '12 at 15:29