java heap analysis with oql: Count unique strings

Question

Im doing a memory analysis of an existing java software. Is there a sql 'group by' equivalent in oql to see the count of objects with same values but different instances.

select count(*) from java.lang.String s group by s.toString()

I'd like to achieve a list of duplicated strings along with the number of duplicates. The purpose of this is to see the cases with large numbers so that they could be optimized using String.intern().

Example:

"foo"    100
"bar"    99
"lazy fox"    50

etc...

Johan Kaving · Accepted Answer · 2012-02-27T21:55:10.550

The following is based on the answer by Peter Dolberg and can be used in the VisualVM OQL Console:

var counts={};
var alreadyReturned={};

filter(
  sort(
    map(heap.objects("java.lang.String"),
    function(heapString){
      if( ! counts[heapString.toString()]){
        counts[heapString.toString()] = 1;
      } else {
        counts[heapString.toString()] = counts[heapString.toString()] + 1;
      }
      return { string:heapString.toString(), count:counts[heapString.toString()]};
    }), 
    'lhs.count < rhs.count'),
  function(countObject) {
    if( ! alreadyReturned[countObject.string]){
      alreadyReturned[countObject.string] = true;
      return true;
    } else {
      return false;
    }
   }
  );

It starts by using a map() call over all String instances and for each String creating or updating an object in the counts array. Each object has a string and a count field.

The resulting array will contain one entry for each String instance, each having a count value one larger than the previous entry for the same String. The result is then sorted on the count field and the result looks something like this:

{
count = 1028.0,
string = *null*
}

{
count = 1027.0,
string = *null*
}

{
count = 1026.0,
string = *null*
}

...

(in my test the String "*null*" was the most common).

The last step is to filter this using a function that returns true for the first occurrence of each String. It uses the alreadyReturned array to keep track of which Strings have already been included.

Thanks that solves nicely the problem. The oql is somehow awkward to use. It all has to happen in one function... — paweloque, Feb 28 '12 at 08:38
wow, didn't know that jvisualvm is that powerful. I found high count values for some Strings - does your code exclude garbage (not referenced Strings)? — Jan, Aug 16 '12 at 20:30
It uses "heap.objects" to find all java.lang.String objects on the heap. There is no filtering to exclude non-referenced Strings. But depending on how the heap dump was generated the JVM may have performed a full GC before, in which case any non-referenced Strings should already have been removed and not included in the heap dump. — Johan Kaving, Aug 20 '12 at 08:59
@JohanKaving I also tried your OQL, but it only returned ONLY one record like "string = org.netbeans.lib.profiler.heap.InstanceDump,count = 1" when I'm filtering out the java.lang.ref.Finalizer referents. I don't know how to implement the map function here. As you said, the output contains some duplicated string "*null*" It's weird that because the query should print only once and it's not a map here. — scugxl, Jun 19 '17 at 10:02

score 9 · Answer 2 · answered Feb 24 '12 at 06:43

9

I would use Eclipse Memory Analyzer instead.

answered Feb 24 '12 at 06:43

Palesz

2,104
18
20

2

I really like your proposal because it solves the problem very nicely. I hope, however, that you'll understand that the bounty goes to Johan Kaving for writing the oql. I think there might be situations where it is useful to understand oql. But thankts however! – paweloque Feb 28 '12 at 08:32
2

To do that use Open Query Browser -> Java Basics -> Group By Value. For objects select `java.lang.String` and for field select `value`. – kichik Jan 25 '15 at 02:25
Used to find duplicated string and nicely explained at https://alblue.bandlem.com/2016/03/duplicate-objects-mat.html – Mario.Cadiz Dec 09 '22 at 22:27

score 2 · Answer 3 · answered Feb 02 '12 at 16:23

Sadly, there isn't an equivalent to "group by" in OQL. I'm assuming you're talking about the OQL that is used in jhat and VisualVM.

There is an alternative, though. If you use pure JavaScript syntax instead of the "select x from y" syntax then you have the full power of JavaScript to work with.

Even so, the alternative way of getting the information you're looking for isn't simple. For example, here's an OQL "query" that will perform the same task as your query:

var set={};
sum(map(heap.objects("java.lang.String"),function(heapString){
  if(set[heapString.toString()]){
    return 0;
  }
  else{
    set[heapString.toString()]=true;
    return 1;
  }
}));

In this example a regular JavaScript object mimics a set (collection with no duplicates). As the the map function goes through each string, the set is used to determine if the string has already been seen. Duplicates don't count toward the total (return 0) but new strings do (return 1).

Hi Peter, thanks for your query, it brings me into the direction, but I'm not yet there :) With this query I see the total number of duplicate strings. What I'd like to see is the string and repeat-number: 'foo' 10 times, 'bar' 100 times, etc.. To see that I tried to output the contents of the set, but I only get strange jscript exceptions.. Do you have an idea how to achieve what I want to see? — paweloque, Feb 21 '12 at 16:26

score 1 · Answer 4 · answered Jun 20 '17 at 17:48

A far more efficient query:

var countByValue = {};

// Scroll the strings
heap.forEachObject(
  function(strObject) {
    var key = strObject.toString();
    var count = countByValue[key];
    countByValue[key] = count ? count + 1 : 1;
  },
  "java.lang.String",
  false
);

// Transform the map into array
var mapEntries = [];
for (var i = 0, keys = Object.keys(countByValue), total = keys.length; i < total; i++) {
  mapEntries.push({
    count : countByValue[keys[i]],
    string : keys[i]
  });
}

// Sort the counts
sort(mapEntries, 'rhs.count - lhs.count');

score 0 · Answer 5 · answered Jun 20 '17 at 05:12

Just post my solution and experience when doing similar issue for other references.

var counts = {};
var alreadyReturned = {};
top(
filter(
    sort(
        map(heap.objects("java.lang.ref.Finalizer"),
            function (fobject) {
                var className = classof(fobject.referent)
                if (!counts[className]) {
                    counts[className] = 1;
                } else {
                    counts[className] = counts[className] + 1;
                }
                return {string: className, count: counts[className]};
            }),
        'rhs.count-lhs.count'),
    function (countObject) {
        if (!alreadyReturned[countObject.string]) {
            alreadyReturned[countObject.string] = true;
            return true;
        } else {
            return false;
        }
    }),
    "rhs.count > lhs.count", 10);

The previous code will output the top 10 classes used by java.lang.ref.Finalizer.
Tips:
1. The sort function by using function XXX is NOT working on my Mac OS.
2. The classof function can return the class of the referent. (I tried to use fobject.referent.toString() -> this returned a lot of org.netbeans.lib.profiler.heap.InstanceDump. This also wasted a lot of my time).

score 0 · Answer 6 · answered Jul 15 '19 at 19:46

Method 1

You can select all the strings and then use the terminal to aggregate them.

Increase the oql limit in the visual vm config files
restart visual vm
oql to get all the strings
copy and paste them into vim
clean the data with vim macros so there's
sort | uniq -c to get the counts.

Method 2

Use a tool to dump all the fields object the class you're interested in ( https://github.com/josephmate/DumpHprofFields can do it )
Use bash to select the strings you're interested in
Use bash to aggregate

java heap analysis with oql: Count unique strings

6 Answers6

Method 1

Method 2

Linked