1

Similar to Can JIT be prevented from optimising away method calls? I'm attempting to track memory usage of long-lived data store objects, however I'm finding that if I initialize a store, log the system memory, then initialize another store, sometimes the compiler (presumably the JIT) is smart enough to notice that these objects are no longer needed.

public class MemTest {
    public static void main(String[] args) {
       logMemory("Initial State");
       MemoryHog mh = new MemoryHog();
       logMemory("Built MemoryHog");
       MemoryHog mh2 = new MemoryHog();
       logMemory("Built Second MemoryHog"); // by here, mh may be GCed
    }
}

Now the suggestion in the linked thread is to keep a pointer to these objects, but the GC appears to be smart enough to tell that the objects aren't used by main() anymore. I could add a call to these objects after the last logMemory() call, but that's a rather manual solution - every time I test an object, I have to do some sort of side-effect triggering call after the final logMemory() call, or I may get inconsistent results.

I'm looking for general case solutions; I understand that adding a call like System.out.println(mh.hashCode()+mh2.hashCode()) at the end of the main() method would be sufficient, but I dislike this for several reasons. First, it introduces an external dependency on the testing above - if the SOUT call is removed, the behavior of the JVM during the memory logging calls may change. Second, it's prone to user-error; if the objects being tested above change, or new ones are added, the user must remember to manually update this SOUT call as well, or they'll introduce difficult to detect inconsistencies in their test. Finally, I dislike that this solution prints at all - it seems like an unnecessary hack that I can avoid with a better understanding of the JIT's optimizations. To the last point, Patricia Shanahan's answer offers a reasonable solution (explicitly print that the output is for memory sanity purposes) but I'd still like to avoid it if possible.

So my initial solution is to store these objects in a static list, and then iterate over them in the main class's finalize method*, like so:

public class MemTest {
    private static ArrayList<Object> objectHolder = new ArrayList<>();

    public static void main(String[] args) {
       logMemory("Initial State", null);
       MemoryHog mh = new MemoryHog();
       logMemory("Built MemoryHog", mh); // adds mh to objectHolder
       MemoryHog mh2 = new MemoryHog();
       logMemory("Built Second MemoryHog", mh2); // adds mh2 to objectHolder
    }

    protected void finalize() throws Throwable {
        for(Object o : objectHolder) {
            o.hashCode();
        }
    }
}

But now I've only offloaded the problem one step - what if the JIT optimizes away the loop in the finalize method, and decides these objects don't need to be saved? Admittedly, maybe simply holding the objects in the main class is enough for Java 7, but unless it's documented that the finalzie method can't be optimized away, there's still nothing theoretically preventing the JIT/GC from getting rid of these objects early, since there's no side effects in the contents of my finalize method.

One possibility would be to change the finalize method to:

protected void finalize() throws Throwable {
    int codes = 0;
    for(Object o : loggedObjects) {
        codes += o.hashCode();
    }
    System.out.println(codes);
}

As I understand it (and I could be wrong here), calling System.out.println() will prevent the JIT from getting rid of this code, since it's a method with external side effects, so even though it doesn't impact the program, it can't be removed. This is promising, but I don't really want some sort of gibberish being output if I can help it. The fact that the JIT can't (or shouldn't!) optimize away System.out.println() calls suggests to me that the JIT has a notion of side effects, and if I can tell it this finalize block has such side effects, it should never optimize it away.

So my questions:

  • Is holdijng a list of objects in the main class enough to prevent them from ever being GCed?
  • Is looping over those objects and calling something trivial like .hashCode() in the finalize method enough?
  • Is computing and printing some result in this method enough?
  • Are there other methods (like System.out.println) the JIT is aware of that cannot be optimized away, or even better, is there some way to tell the JIT not to optimize away a method call / code block?

*Some quick testing confirms, as I suspected, that the JVM doesn't generally run the main class's finalize method, it abruptly exits. The JIT/GC may still not be smart enough to GC my objects simply because the finalize method exists, even if it doesn't get run, but I'm not confident that's always the case. If it's not documented behavior, I can't reasonably trust it will remain true, even if it's true now.

Community
  • 1
  • 1
dimo414
  • 47,227
  • 18
  • 148
  • 244
  • Anchor the objects in an object instance or in a static variable. Or just somehow reference your object references in `main` after your measurements. – Hot Licks Mar 04 '13 at 03:12
  • To your first point, that's my first question, is that enough. To your second, I said in my question that I want to avoid needing to mandate that. – dimo414 Mar 04 '13 at 03:28
  • The objects must be referenced SOMEWHERE. It doesn't matter were, so long as whatever contains those references is itself referenced (and that referenced, etc), down to a static or stack reference. (Note that the reference in `main` can be to simply create an array of Object and place the reference there -- no need to "use" the object.) – Hot Licks Mar 04 '13 at 11:42
  • Make mh and mh2 static variables. Can it get any simpler than that? – Hot Licks Mar 04 '13 at 11:44
  • I understand how GC works. The purpose of my question is to explore the best way to prevent GC'ing objects who's memory usage I am still interested in, in a general and user-error-proof way. Making `mh` and `mh2` static fails the former, and mandating that a print call be made in the main method fails the latter. – dimo414 Mar 04 '13 at 13:19
  • I can't tell what you want. Magic, apparently. If you want an object to persist it must be referenced -- GC 101. But you seem unwilling to consider any approach that implies a reference of any sort. – Hot Licks Mar 04 '13 at 15:59
  • I'm trying to explore the *cleanest* way to hold objects that aren't actually used by the program any more, to prevent GC. Obviously, making additional calls to the objects will cause them to persist, but that's not a clean / scalable solution. I suspect that simply adding them to an instance list would be enough, but I'm not certain that is true or will remain true into the future. – dimo414 Mar 04 '13 at 16:10
  • So how is storing their pointers in static variables not "clean"??? – Hot Licks Mar 04 '13 at 16:17
  • Yes, that likely is a fine solution, I'm exploring options. – dimo414 Mar 04 '13 at 17:06

2 Answers2

1

Yes, it would be legal for mh1 to be garbage collected at that point. At that point, there is no code that could possibly use the variable. If the JVM could detect this, then the corresponding MemoryHog object will be treated as unreachable ... if the GC were to run at that point.

A later call like System.out.println(mh1) would be sufficient to inhibit collection of the object. So would using it in a "computation"; e.g.

    if (mh1 == mh2) { System.out.println("the sky is falling!"); }

Is holding a list of objects in the main class enough to prevent them from ever being GCed?

It depends on where the list is declared. If the list was a local variable, and it became unreachable before mh1, then putting the object into the list will make no difference.

Is looping over those objects and calling something trivial like .hashCode() in the finalize method enough?

By the time the finalize method is called, the GC has already decided that the object is unreachable. The only way that the finalize method could prevent the object being deleted would be to add it to some other (reachable) data structure or assign it to a (reachable) variable.

Are there other methods (like System.out.println) the JIT is aware of that cannot be optimized away,

Yea ... anything that makes the object reachable.

or even better, is there some way to tell the JIT not to optimize away a method call / code block?

No way to do that ... apart from making sure that the method call or code block does something that contributes to the computation being performed.


UPDATE

First, what is going on here is not really JIT optimization. Rather, the JIT is emitting some kind of "map" that the GC is using to determine when local variables (i.e. variables on the stack) are dead ... depending on the program counter (PC).

Your examples to inhibit collection all involve blocking the JIT via SOUT, I'd like to avoid that somewhat hacky solution.

Hey ... ANYTHING that depends on the exact timing of when things are garbage collected is a hack. You are not supposed to do that in a properly engineered application.

I updated my code to make it clear that the list that's holding my objects is a static variable of the main class, but it seems if the JIT's smart enough it could still theoretically GC these values once it knows the main method doesn't need them.

I disagree. In practice, the JIT cannot determine that a static will never be referenced. Consider these cases:

  • Before the JIT runs, it appears that nothing will use static s again. After the JIT has run, the application loads a new class that refers to s. If the JIT "optimized" the s variable, the GC would treat it as unreachable, and either null it or create a dangling references. When the dynamically loaded class then looked at s it would then see the wrong value ... or worse.

  • If the application ... or any libraries used by the application ... uses reflection, then it can refer to the value of any static variable without this being detectable by the JIT.

So while it is theoretically possible to do this optimization is a small number of cases:

  • in the vast majority of cases, you can't, and
  • in the few cases that you can, the pay-off (in terms of performance improvement) is most likely negligible.

I similarly updated my code to clarify that I'm talking about the finalize method of the main class.

The finalize method of the main class is irrelevant because:

  • you are not creating an instance of the main class, and
  • the finalize method CANNOT refer to the local variables of another method (e.g. the main method).

... it's existence prevents the JIT from nuking my static list.

Not true. The static list can't be nuked anyway; see above.

As I understand it, there's something special about SOUT that the JIT is aware of that prevents it from optimizing such calls away.

There is nothing special about sout. It is just something that we KNOW that influences the results of the computation and that we therefore KNOW that the JIT cannot legally optimize away.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • 1) Your examples to inhibit collection all involve blocking the JIT via SOUT, I'd like to avoid that somewhat hacky solution. 2) I updated my code to make it clear that the list that's holding my objects is a static variable of the main class, but it seems if the JIT's smart enough it could still theoretically GC these values once it knows the main method doesn't need them. – dimo414 Mar 04 '13 at 04:38
  • 3) I similarly updated my code to clarify that I'm talking about the finalize method of the main class. It shouldn't get GCed until the main method returns, at which point the JVM exits, but maybe (this is part of my question) even though it doesn't get called, it's existence prevents the JIT from nuking my static list. 4) "anything that makes the object reachable" like what? As I understand it, there's something special about SOUT that the JIT is aware of that prevents it from optimizing such calls away. Can I replicate that behavior with any other methods? – dimo414 Mar 04 '13 at 04:40
1

Here's a plan that may be overkill, but should be safe and reasonably simple:

  • Keep a List of references to the objects.
  • At the end, iterate over the list summing the hashCode() results.
  • Print the sum of the hash codes.

Printing the sum ensures that the final loop cannot be optimized out. The only thing you need to do for each object creation is put it in a List add call.

Patricia Shanahan
  • 25,849
  • 4
  • 38
  • 75
  • Yep, that's my current solution, but nitpicky as it is, I dislike the unnecessary println. Per the 4th part of my question, I'm wondering if there are other operations that don't seem quite so hacky. The JIT is somehow aware that `System.out.println()` calls can't be cleaned, how can I similarly indicate a different operation has side effects that can't be skipped? – dimo414 Mar 04 '13 at 04:19
  • @dimo414 It's a slight risk, but you could have a method in a different class that does nothing that you call with the hash code sum as argument. The risk is that the JIT will work out that it does nothing. Historically, I've been caught out a few times by smart compilers when I've been constructing benchmarks, so that the only thing I really trust to ensure liveness is an actual output. – Patricia Shanahan Mar 04 '13 at 04:49
  • Indeed, I imagine I could find some sort of workaround like diving down the stack which works for the current GC but breaks mysteriously in the future. That would be very dangerous, and I'd like to avoid it. If output *really* is the only way to prevent this, that is what I'll do, but it at least seems to me like if the JIT is aware of special methods it can't optimize away, you should be able to control that in some way. – dimo414 Mar 04 '13 at 04:53
  • @dimo414 I don't think the JIT is aware of special methods. Rather, it examines methods to see if it can optimize them. An operating system call with external effect, such as a write call, would be an optimization deal-killer. – Patricia Shanahan Mar 04 '13 at 04:57
  • That would make sense. In that case, is there a better operating system call that logically indicates "This call should not be optimized away" than `System.out.println()` which both doesn't really indicates that, and has visible side effects I'd like to avoid. One thought would be `if(System.currentTimeMillis() == o.hashCode())` - presumably that's similarly un-JIT-able, and would therefore preserve my objects. – dimo414 Mar 04 '13 at 05:04
  • 1
    If you are worried about the visible side effects, remember you can test a condition on the hash code sum and make a unobtrusive change in your output such as tab vs. space. I tend to be more ruthless - I output a line that explains that the number is being output to ensure liveness. – Patricia Shanahan Mar 04 '13 at 05:10
  • "I output a line that explains that the number is being output to ensure liveness." - That is, I suppose, a very reasonable way to deal with it. I'd still like to know if there's a "better" way, but your solution's very reasonable, and I may go with it. – dimo414 Mar 04 '13 at 05:18