0

My intention is to analyze the bytecode of a Java program and collect the data about the data structures that have been used. That data includes the initial capacity and how the particular data structure has grown throughout the runtime(growing rate, according to the growing policy of the Java data structures).

Can I assume something like capacity is proportional to the memory taken by that particular data structure instance?

For example;

I used RamUsageEstimator from com.carrotsearch.sizeof.RamUsageEstimator to get the memory size taken by the particular data structure instance.

    List<Integer> intList = new ArrayList<Integer>(4);

    for(int i = 0 ; i < 5 ; i++){
            intList.add(i);
            System.out.println("Size(byte) -> " + RamUsageEstimator.sizeOf(intList));
    }

I ran this code with an ArrayList of an initial size of 4. I added 5 elements to the list using a loop. According to the growing policy of the Java ArrayList, it should grow itself by 50%, which means after the 4rth element when I enter the 5th element new size would be 6. I got the following results, Bytes -> 72, 88, 104, 120, 144. We can clearly see here that between the first 4 elements, the byte gap is 16, and eventually, at the 5th element, it has become 24. So, it clearly shows the growth and the rate right?

Is it possible to achieve my task this way?

Any answer would be great! Thank you!

  • 1
    Your assumption is not correct. There are data structures that do not grow linearly. Array lists, for example, normally exhibit a doubling growth if full. This of course approximates to growing by a constant factor per insert, but the actual distribution will look different. – Turing85 Dec 23 '21 at 05:48
  • For reference: [Here is an Ideone demo](https://ideone.com/t3lpU4) showing the non-linear growth behaviour of `ArrayList`. – Turing85 Dec 23 '21 at 06:01
  • `HashMap` is another example. And note that any data structures that preallocates based on a `capacity` argument or a built-in initial size estimate doesn't grow linearly to start with. – Stephen C Dec 23 '21 at 06:08
  • Your larger goal of predicting the memory usage of an arbitrary data structure is probably unrealistic. (In fact, for some formulations, the "problem" is probably non-computable function; e.g. it is analogous to the Halting problem.) – Stephen C Dec 23 '21 at 06:12
  • Thanks for commenting! My goal is not to predict memory usage. It is to predict or know how the particular data structure has grown over the runtime (from the initial capacity to the final capacity). Is it possible? – Janitha Nawarathna Dec 23 '21 at 06:35
  • 1
    @Turing85 `ArrayList`´s growth is 50% (or factor 1.5), not doubling. – Holger Dec 23 '21 at 12:25
  • @Holger the actual growth rate is an impelmentation detail. Whether it's 50% or 100% is irrelevant for the argument; it's not linear. – Turing85 Dec 23 '21 at 12:27
  • 2
    @Turing85 you may call it an implementation detail, yet you decided to say "doubling" in your comment. I’m not even sure whether the non linear growth has any relevance to the OP’s problem in general. The amortized growth or just the resulting capacity seems to be what the analysis is heading for, which still is proportional. – Holger Dec 23 '21 at 12:33
  • 1
    @StephenC there’s always an exception to the rule, `java.util.Vector` supports [a growth by a constant](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/Vector.html#capacityIncrement). – Holger Dec 23 '21 at 12:36
  • Thanks, @Holger! That's what I need to extract! – Janitha Nawarathna Dec 23 '21 at 16:04

1 Answers1

1

When you create a heap dump from a running JVM that have eaten up nearly all of the heap available to it, and use a tool like Eclipse MAT to sum up the size of all data structures, the result is usually 10% up to 250% larger than the size of the heap itself … the magical RAM increase! Or that's because Java knows how to use a single byte of the RAM multiple times …

The reason is much more profane: the same (child) data structure is referenced by multiple parent data structures, and that is not always visible without digging deep. A good sample is the old java.util.Date class …

Tools like the already mentioned "RamUsageEstimator" try to make good guesses about the size, they are not named "Estimator" out of purpose.

For your goal, you should compare the size of your data structures having hundreds or thousands of initial elements with their size after doubling and tripling the number of elements, to get an idea of the behaviour.

Doing this with one, two, three elements is fruitless at best, misleading at worst.

tquadrat
  • 3,033
  • 1
  • 16
  • 29