I'm profiling an application of mine - it's a macro system built on top of Java. I'm using hprof to profile some basic examples, and here are the top 20 functions where time is spent:
rank self accum count trace method
1 14.73% 14.73% 453 303755 java.lang.Object.<init>
2 9.75% 24.48% 300 303754 java.lang.Object.<init>
3 7.18% 31.66% 221 303641 java.lang.Object.<init>
4 6.83% 38.49% 210 303621 java.util.ArrayList.<init>
5 6.76% 45.25% 208 303620 java.util.ArrayList.<init>
6 5.95% 51.20% 183 303833 java.lang.Character.toChars
7 4.55% 55.75% 140 303809 java.lang.Object.<init>
8 4.42% 60.18% 136 303791 java.lang.Object.<init>
9 3.77% 63.95% 116 303756 java.lang.Object.<init>
10 3.64% 67.59% 112 300509 java.lang.Object.<init>
11 2.67% 70.25% 82 303789 java.lang.Object.<init>
12 2.54% 72.79% 78 303790 java.lang.Object.<init>
13 1.69% 74.48% 52 303688 java.lang.Object.<init>
14 1.66% 76.14% 51 303644 java.lang.Object.<init>
15 1.46% 77.60% 45 305935 java.lang.Object.<init>
16 1.40% 79.00% 43 303758 java.lang.Object.<init>
17 1.20% 80.20% 37 305324 java.lang.Object.<init>
18 0.98% 81.18% 30 302559 java.lang.Object.<init>
19 0.62% 81.79% 19 300006 java.util.Arrays.copyOf
20 0.52% 82.31% 16 305214 java.lang.Object.<init>
As you can see, most of the time is spent is Object.<init>
. This is somewhat obscure to me.
My intuition is that the time is taken by memory allocation. Working in C has left me with a strong sense that dynamic memory allocation (i.e. malloc()
) is inefficient. However, in Java the gospel seems to be that the JVM deals efficiently with short-lived objects; and that there is consequently nothing to be gained from patterns such as object pools.
I should add that the most performance-hungry part of the application is the parser, which does indeed create a whole lot of short-lived objects as part of its operation.
What do you think the time spent in Object.<init>
is due to? Is it indeed linked to memory allocation? Could I benefit from using an object pool or some other trick to reduce memory allocation?
EDIT:
In reaction to Mike Dunlavey's answer, here is the interpretation of hprof's output by JPerfAnal, giving inclusive times.
Method Times by Caller (times inclusive): 3076 ticks
1: java.lang.Object.<init>: 71,26% (2192 inclusive / 2191 exclusive)
2: com.sun.demo.jvmti.hprof.Tracker.ObjectInit: 0,03% (1 inclusive / 0 exclusive)
3: java.lang.Thread.currentThread: 0,03% (1 inclusive / 1 exclusive)
1: parser.ParseData.<init>: 34,33% (1056 inclusive / 0 exclusive)
2: parser.ParseErrors.<init>: 13,98% (430 inclusive / 1 exclusive)
3: java.lang.Object.<init>: 7,18% (221 inclusive / 221 exclusive)
3: java.util.ArrayList.<init>: 6,76% (208 inclusive / 208 exclusive)
2: java.lang.Object.<init>: 13,52% (416 inclusive / 416 exclusive)
2: java.util.ArrayList.<init>: 6,83% (210 inclusive / 0 exclusive)
3: java.util.ArrayList.<init>: 6,83% (210 inclusive / 210 exclusive)
1: parser.Matcher.parse: 34,33% (1056 inclusive / 0 exclusive)
2: parser.ParseData.<init>: 34,33% (1056 inclusive / 0 exclusive)
3: parser.ParseErrors.<init>: 13,98% (430 inclusive / 1 exclusive)
4: java.lang.Object.<init>: 7,18% (221 inclusive / 221 exclusive)
4: java.util.ArrayList.<init>: 6,76% (208 inclusive / 208 exclusive)
3: java.lang.Object.<init>: 13,52% (416 inclusive / 416 exclusive)
3: java.util.ArrayList.<init>: 6,83% (210 inclusive / 0 exclusive)
4: java.util.ArrayList.<init>: 6,83% (210 inclusive / 210 exclusive)
1: java.util.ArrayList.<init>: 28,38% (873 inclusive / 419 exclusive)
2: java.util.AbstractList.<init>: 14,76% (454 inclusive / 0 exclusive)
3: java.util.AbstractCollection.<init>: 14,76% (454 inclusive / 0 exclusive)
4: java.lang.Object.<init>: 14,76% (454 inclusive / 454 exclusive)
(JPerfAnal also generates an inverted tree where the childs are the callers of the parents. I don't reproduce it for brevity, but suffice to say that about 40% of Object.<init>
calls are made from the initialization of ArrayList
, ParseData
and ParseErrors
.)
Now, that doesn't really change how I think about the issue, or my questions. I could change the algorithm so that it instantiates less objects; but for the time being, I'm looking for an orthogonal solution. So: could object pools help me?