6

I'm currently testing my proof of concept prototype dealing with XML schema, and built around a very memory consuming external library for tree automata (for which I've got the sources), I'd like to plot "real peak" (heap) memory consumption of the different runs with increasing schema sizes (the metric used fits my purpouse and do no affect the question), or at least a reasonable approximation of it.

To give an order of magnitude, for a run with a real peak of 100MB (I tested it running several times exactly the same configuration of input/parameters, forcing the jvm memory with -Xmx and -Xms to decreasing value, I get Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded < 100MB, with stable and repeatable results) it occupy around 1.1GB, that's why it is extremely important for me to get the real number, because they differs a lot!

I've spent the last 10 days reading questions on the web and in stackoverflow, what I actually know is:

  1. System.gc() "suggest" a GC run, does not force it in any way, so it is not possible to rely on it for detecting memory usage peaks

  2. What is usually suggested is to count object occupation (I saw SizeOf project for this, I tried and works fine, even if it does not fits my needs), that is not feasible for me because heavy memory allocation happens due to the creation of a lot of collection (set, list and map) iterators in different methods, called a very high number of times (say millions each for a run of 10 minutes for what I remember), so it would be extremely difficult to detect all the involved objects and performing the sums (I debugged many many runs in days with memory consumption graphs without being able to identify only a single bottle-neck)

  3. There is no way to easily obtain the memory occupation of a method (expressed as the peak of object memory allocation)

The fact is that I experienced by myself that System.gc() calls are not reliable (e.g. different runs of the same configuration, different memory read after a System.gc() due to the GC being really called or not), but when I press the "GC button" in JVisualVM or Jconsole it never fails to run GC or refuses to do so.

So my question is: calling their implementation of that button (I didn't try it yet but for what I've read up to now it seems feasible using jconsole.jar with attach api) will differ from calling System.gc() directly from my code, thus solving my problem? If not, how do you explain the "deterministc behaviour" of that button?

Up to now I did some manual test of real memory peak given 10 increasing schema sizes (for this kind of measurement the schemas are automatically generated from a single "complexity parameter") and I plotted the expected curve, if I will not be able to obtain a better solution I want to run my code as an external jar with -Xmx/-Xms equal to slightly less than my prediction of the expected memory peak, catching the OutMemoryException in the external process ErrorStream and relaunching with increased memory until a complete run is achieved. (If the naive memory prediction will not be robust enough I will apply appropriate Machine Learning techniques). I know that this is not an elegant solution but in my scenario (academia) I can afford to spend some extra time for these measurements. If you have other suggestions or improvement to this bruteforce method you are (extremely) welcome to share them.

System info (machine is a Fedora 17, 64 bit):

java version "1.7.0_04" Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)

Thanks in advance, Alessandro

Gray
  • 115,027
  • 24
  • 293
  • 354
Alessandro S.
  • 875
  • 12
  • 24
  • 1
    Did you try to test simply with two or three `System.gc()` calls in a row, with a possible `sleep` between them? Because I have yet to see that method fail. The first call may cause a minor collection, but the second one is already quite sure to cause a full GC. – Marko Topolnik Nov 09 '12 at 10:06
  • No @MarkoTopolnik, I only tried single System.gc() call, I will give it a try, if it will prove "quite" stable could be enough for plotting something. Thanks! – Alessandro S. Nov 09 '12 at 10:10
  • @MarkoTopolnik: in the last hours I implemented my "getMemory" method using your suggestion, after several attempts it seems to be quite stable with 2 calls and only 1sec of sleep. Its stability allowed me to discover a really good point for memory peak estimation, so my problem is solved (using 2 different profiles for memory/time). If you put this into an answer I will accept it! Thanks again very much. – Alessandro S. Nov 09 '12 at 13:32

4 Answers4

4

As far as I know, Jconsole or any other tool, uses System.gc() only. There is no other option. As everyone know, java tells everyone not to rely on System.gc(), but that doesn't mean it doesn't work at all.

So coming to your query, you seem to be concerned that how come pressing that button calls GC directly & still java says System.gc only "suggests" to call GC. I say, that button also calls System.gc() & it only "suggests" java to try for GC, & it happens somehow that java decides to perform GC at that time itself (its not guaranteed but somehow java does it.)

So to prove this fact, I just created simple program which just creates loads of objects. It has commented line with "System.gc()". Now try running this same program first with commented System.gc() & then by uncommenting System.gc(). Make sure to provide VM arguments as -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails.

package ravi.tutorial.java.gc;

/**
 * Just to test GC. RUn with below VM arguments.
 * 
 * -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails
 * 
 * 
 * @author ravi.k
 * 
 */
public class TestGC {

    public static A a;

    /**
     * @param args
     * @throws InterruptedException
     */
    public static void main(String[] args) throws InterruptedException {

        for (int i = 0; i < 100; i++) {
            populateObjects();
            System.out.println("population done for batch: " + i);
        }

    }

    public static void populateObjects() {
        for (int i = 0; i < 100000; i++) {
            a = new A("A");
        }
        //System.gc();
    }

}

class A {
    String s;

    public A(String s) {
        this.s = s;
    }
}

Here partial outputs frmm my machine.

Commened System.gc(): Here GC is called at will of jre.

population done for batch: 0
population done for batch: 1
population done for batch: 2
population done for batch: 3
population done for batch: 4
population done for batch: 5
population done for batch: 6
population done for batch: 7
population done for batch: 8
population done for batch: 9
0.332: [GC 0.332: [ParNew: 17024K->410K(19136K), 0.0024479 secs] 17024K->410K(83008K), 0.0025219 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
population done for batch: 10
population done for batch: 11
population done for batch: 12
population done for batch: 13
population done for batch: 14
population done for batch: 15
population done for batch: 16
population done for batch: 17
population done for batch: 18
population done for batch: 19
0.344: [GC 0.344: [ParNew: 17434K->592K(19136K), 0.0011238 secs] 17434K->592K(83008K), 0.0011645 secs] [Times: user=0.00 sys=0.01, real=0.00 secs] 
population done for batch: 20
population done for batch: 21
population done for batch: 22
population done for batch: 23
population done for batch: 24
population done for batch: 25
population done for batch: 26
population done for batch: 27
population done for batch: 28
population done for batch: 29
population done for batch: 30
0.353: [GC 0.353: [ParNew: 17616K->543K(19136K), 0.0011398 secs] 17616K->543K(83008K), 0.0011770 secs] [Times: user=0.00 sys=0.00, real=0.00 secs] 
population done for batch: 31
population done for batch: 32
population done for batch: 33

Uncommented System.gc(): Here GC is called for every batch. Now System.gc() is only suggesting GC, but java is choosing to run GC at that time itself. This is exact same case for that magic GC button in other tools :)

0.337: [Full GC (System) 0.337: [CMS: 0K->400K(63872K), 0.0219250 secs] 3296K->400K(83008K), [CMS Perm : 4423K->4422K(21248K)], 0.0220152 secs] [Times: user=0.04 sys=0.00, real=0.02 secs] 
population done for batch: 0
0.364: [Full GC (System) 0.364: [CMS: 400K->394K(63872K), 0.0161792 secs] 2492K->394K(83008K), [CMS Perm : 4425K->4425K(21248K)], 0.0162336 secs] [Times: user=0.01 sys=0.00, real=0.02 secs] 
population done for batch: 1
0.382: [Full GC (System) 0.382: [CMS: 394K->394K(63872K), 0.0160193 secs] 2096K->394K(83008K), [CMS Perm : 4425K->4425K(21248K)], 0.0160834 secs] [Times: user=0.01 sys=0.00, real=0.01 secs] 
population done for batch: 2
0.399: [Full GC (System) 0.399: [CMS: 394K->394K(63872K), 0.0160866 secs] 2096K->394K(83008K), [CMS Perm : 4425K->4425K(21248K)], 0.0161489 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
population done for batch: 3
0.417: [Full GC (System) 0.417: [CMS: 394K->394K(63872K), 0.0156326 secs] 2096K->394K(83008K), [CMS Perm : 4425K->4425K(21248K)], 0.0156924 secs] [Times: user=0.02 sys=0.00, real=0.02 secs] 
population done for batch: 4
0.434: [Full GC (System) 0.434: [CMS: 394K->394K(63872K), 0.0157274 secs] 2096K->394K(83008K), [CMS Perm : 4425K->4425K(21248K)], 0.0157897 secs] [Times: user=0.02 sys=0.00, real=0.01 secs] 
population done for batch: 5

To add more, its just like threads. There is no guarantee when thread runs, but whenever we write any sample thread program, thread runs that time itself. So we should not be blaming java how come it ran as soon as thread started :). Java is only saying not to rely on these things, but they do work. Also though they work in some cases doesn't mean they will work everytime. Even those jconsole tools may fail to execute GC, just that we never seen it.

Ravi K
  • 976
  • 7
  • 9
  • Thanks you, ok know I know that it is always System.gc() that is called, so it is not worth to spend time calling jconsole from my program, you saved a lot of my time! – Alessandro S. Nov 09 '12 at 13:26
3

I have quite a bit of positive experience with this trivial approach:

System.gc();
Thread.sleep(500);
System.gc();

One GC run is often not enough due to object finalization issues, where an object may be resurrected in finalization. Therefore additional memory is released in the second GC run.

Do note that this, as well as ony other, seemingly "smarter", approaches, are all heuristics and quite dependant on the exact version of JVM, including its GC configuration. But in many cases you will not be so much interested in generality: if it works right now and allows you to do your measurements, it is the way to go.

Marko Topolnik
  • 195,646
  • 29
  • 319
  • 436
0

1) System.gc() "suggest" a GC run, does not force it in any way, so it is not possible to rely on it for detecting memory usage peaks

Thats what the spec says but if you use OpenJDK or HotSpot it will always perform a Full GC unless you turn it off.

What is usually suggested is to count object occupation

I would suggest using a commercial memory profiler. I would have the JVM start with a maximum of 8 GB and see how much it tries to use. After that I would increase or decrease it based on your judgement on whether it would like more or doesn't appear to be using it.

There is no way to easily obtain the memory occupation of a method (expressed as the peak of object memory allocation)

The only memory a method uses is on the stack. You can trace how much objects (count,classes,size) where creating in a method but those objects don't belong to that method and can be used anywhere, even after the method has returned.

If not, how do you explain the "deterministc behaviour" of that button?

I would put that down to subjective analysis. ;)

Ideally you should be running the JVM with 2-3x the minimum memory it needs for it to run efficiently. Trying to save a few 100 MB which cost less than $1 is not always useful. ;)

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • 1
    Even though the first one may also be a full GC, there are still issues with finalization etc., where many objects actually need two full GC passes to get collected. – Marko Topolnik Nov 09 '12 at 10:14
  • It can be more if they refer to each other, or the finalization queue hasn't drained between GCs. ;) – Peter Lawrey Nov 09 '12 at 10:15
  • I added some system info to the question, thanks @PeterLawrey. For the method occupation I mean the heap used by means of object allocation, in this case all of that iterator. – Alessandro S. Nov 09 '12 at 10:17
  • Yes, that's consistent with my findings where I run two or even three GC's, with a grace period between them. – Marko Topolnik Nov 09 '12 at 10:17
  • @PeterLawrey I have issues with your argument based on price per GB because this definitely hurts scalability, therefore constrains use cases. A process running in 1 MB is not in the same category as the one that needs 100 MB, no matter what is the price of that amount of RAM. – Marko Topolnik Nov 09 '12 at 10:19
  • @MarkoTopolnik Does it make sense to spend $1 of your time to save $10 of memory when its reusable? Does it make sense to spend $100 of your time (cost to the company) to save $10 of memory (cost to the company)? If you don't ask this of yourself you get examples of people spending $1000s (including fixing bugs that arise) to save less than $0.005 of memory (e.g. 1 MB ;). To me that is madness. – Peter Lawrey Nov 09 '12 at 10:27
  • But that only describes one situation. What if I want to run 100 instances of the same application, on the same physical machine? Now the 10x factor is not multiplying $1 anymore. – Marko Topolnik Nov 09 '12 at 10:29
  • You want to save 1 MB on 100 instances, might be worth 50 cents in total or about five minutes including testing of one persons time on minimum wage. YMMV. If you want to save 100 MB on 100 machines it might be worth $50 which might be worth spending 7 minutes including testing properly on minimum wage. – Peter Lawrey Nov 09 '12 at 10:33
  • 2
    I generally agree with you @PeterLawrey, but my prorotype is only intended to show time/mem "feasability" of my proposed method, so for me it is fundamental to evalutate the real peak (the naive measurement for 100MB says more than 1GB, I really cannot show that number, given that is not real peak). The other problem is that, given increasing schema size, I need to obtain a higher peak, if the GC do what he wants I'm not guaranteed to obtain it, and if fact it is not happening right now, so the graphs are not usable for me. By the way my concern is not to optimize, but only to estimate memory. – Alessandro S. Nov 09 '12 at 10:34
  • How much your free time is worth, and how much a client might imagine their hardware to cost to upgrade will vary allot. Until you have memory profiled your application, the peak may not be very useful. i.e. it may be better to spend the time cutting consumption. What I did was move to off heap memory with memory mapped files. I regularly map in 500 GB or more of data, but my max heap size is 1 GB which isn't used much. – Peter Lawrey Nov 09 '12 at 10:38
  • @PeterLawrey No, I want to save 99 MB on 100 instances, totalling 10 GB. This may mean the difference between running inconspicuously on one host and needing a whole cluster to run. The difference in the amount of setup, expertise to do it, business decisions, etc. is huge. – Marko Topolnik Nov 09 '12 at 10:38
  • @MarkoTopolnik You can run 10 GB on PC easily, (in fact a laptop) Any organization which considers running 10 GB on many machines won't be paying one person minimum wage so the scale of the problem might change but the ratio of cost of hardware to cost of your time is still an issue. – Peter Lawrey Nov 09 '12 at 10:40
  • @PeterLawrey I was just citing an example. The difference between 1 MB and 100 MB is huge because it is the scaling factor that counts. They are simply not in the same league, and this has quite little to do with the literal DRAM price. In my own company we have exactly this problem: we use a distributed architecture with many standalone Java processes communicating over JMS. The RAM waste stemming from each node being a full-fledged JVM instance is a definite pain point and a "black eye" on our architecture. – Marko Topolnik Nov 09 '12 at 10:52
-1

You can kinda force GC like this....

private static void force_gc()
{
    Object obj = new Object();
    WeakReference<Object> ref = new WeakReference<Object>(obj);
    obj = null;
    while (ref.get() != null)
    {
        Log.d(LOGTAG, "Forcing gc() ...");
        System.gc();
    }
}

apart from that... i'm interested to see where this question goes.

Shark
  • 6,513
  • 3
  • 28
  • 50
  • 2
    I played around with this before, it is unreliable. Weak ref getting cleared doesn't guarantee a thing. Specifically, it doesn't guarantee that the referent has been collected. – Marko Topolnik Nov 09 '12 at 12:03
  • @MarkoTopolnik so is for(int i = 0; i<3; i++) { System.gc(); sleep(100); } any better? – Shark Nov 09 '12 at 13:31
  • 1
    You'd be surprised to hear that it may indeed be better in some circumstances, but it is definitely simpler, while achieving the exact same guarantees. The code involving weak refs just looks like it's doing something smart, so it's misleading. – Marko Topolnik Nov 09 '12 at 13:37
  • Do note that it is not me who placed the downvote on your answer. I merely commented with my experience. – Marko Topolnik Nov 09 '12 at 13:43
  • @MarkoTopolnik "The code involving weak refs just looks like it's doing something smart, so it's misleading." couldn't have said it better - thats exactly why i thought it was better. I dont' care about the downvote, I know the answer isn't quite relevant to the question being asked. – Shark Nov 09 '12 at 13:44
  • The way it can actually perform worse than the simple brute-force approach is that the ref will almost certainly get cleared on the first run, and a second run is very often needed to weed out all the garbage. – Marko Topolnik Nov 09 '12 at 13:46