I'm currently testing my proof of concept prototype dealing with XML schema, and built around a very memory consuming external library for tree automata (for which I've got the sources), I'd like to plot "real peak" (heap) memory consumption of the different runs with increasing schema sizes (the metric used fits my purpouse and do no affect the question), or at least a reasonable approximation of it.
To give an order of magnitude, for a run with a real peak of 100MB (I tested it running several times exactly the same configuration of input/parameters, forcing the jvm memory with -Xmx and -Xms to decreasing value, I get Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded < 100MB, with stable and repeatable results) it occupy around 1.1GB, that's why it is extremely important for me to get the real number, because they differs a lot!
I've spent the last 10 days reading questions on the web and in stackoverflow, what I actually know is:
System.gc() "suggest" a GC run, does not force it in any way, so it is not possible to rely on it for detecting memory usage peaks
What is usually suggested is to count object occupation (I saw SizeOf project for this, I tried and works fine, even if it does not fits my needs), that is not feasible for me because heavy memory allocation happens due to the creation of a lot of collection (set, list and map) iterators in different methods, called a very high number of times (say millions each for a run of 10 minutes for what I remember), so it would be extremely difficult to detect all the involved objects and performing the sums (I debugged many many runs in days with memory consumption graphs without being able to identify only a single bottle-neck)
There is no way to easily obtain the memory occupation of a method (expressed as the peak of object memory allocation)
The fact is that I experienced by myself that System.gc() calls are not reliable (e.g. different runs of the same configuration, different memory read after a System.gc() due to the GC being really called or not), but when I press the "GC button" in JVisualVM or Jconsole it never fails to run GC or refuses to do so.
So my question is: calling their implementation of that button (I didn't try it yet but for what I've read up to now it seems feasible using jconsole.jar with attach api) will differ from calling System.gc() directly from my code, thus solving my problem? If not, how do you explain the "deterministc behaviour" of that button?
Up to now I did some manual test of real memory peak given 10 increasing schema sizes (for this kind of measurement the schemas are automatically generated from a single "complexity parameter") and I plotted the expected curve, if I will not be able to obtain a better solution I want to run my code as an external jar with -Xmx/-Xms equal to slightly less than my prediction of the expected memory peak, catching the OutMemoryException in the external process ErrorStream and relaunching with increased memory until a complete run is achieved. (If the naive memory prediction will not be robust enough I will apply appropriate Machine Learning techniques). I know that this is not an elegant solution but in my scenario (academia) I can afford to spend some extra time for these measurements. If you have other suggestions or improvement to this bruteforce method you are (extremely) welcome to share them.
System info (machine is a Fedora 17, 64 bit):
java version "1.7.0_04" Java(TM) SE Runtime Environment (build 1.7.0_04-b20) Java HotSpot(TM) 64-Bit Server VM (build 23.0-b21, mixed mode)
Thanks in advance, Alessandro