16

Im doing a memory analysis of an existing java software. Is there a sql 'group by' equivalent in oql to see the count of objects with same values but different instances.

select count(*) from java.lang.String s group by s.toString()

I'd like to achieve a list of duplicated strings along with the number of duplicates. The purpose of this is to see the cases with large numbers so that they could be optimized using String.intern().

Example:

"foo"    100
"bar"    99
"lazy fox"    50

etc...

trincot
  • 317,000
  • 35
  • 244
  • 286
paweloque
  • 18,466
  • 26
  • 80
  • 136

6 Answers6

23

The following is based on the answer by Peter Dolberg and can be used in the VisualVM OQL Console:

var counts={};
var alreadyReturned={};

filter(
  sort(
    map(heap.objects("java.lang.String"),
    function(heapString){
      if( ! counts[heapString.toString()]){
        counts[heapString.toString()] = 1;
      } else {
        counts[heapString.toString()] = counts[heapString.toString()] + 1;
      }
      return { string:heapString.toString(), count:counts[heapString.toString()]};
    }), 
    'lhs.count < rhs.count'),
  function(countObject) {
    if( ! alreadyReturned[countObject.string]){
      alreadyReturned[countObject.string] = true;
      return true;
    } else {
      return false;
    }
   }
  );

It starts by using a map() call over all String instances and for each String creating or updating an object in the counts array. Each object has a string and a count field.

The resulting array will contain one entry for each String instance, each having a count value one larger than the previous entry for the same String. The result is then sorted on the count field and the result looks something like this:

{
count = 1028.0,
string = *null*
}

{
count = 1027.0,
string = *null*
}

{
count = 1026.0,
string = *null*
}

...

(in my test the String "*null*" was the most common).

The last step is to filter this using a function that returns true for the first occurrence of each String. It uses the alreadyReturned array to keep track of which Strings have already been included.

Johan Kaving
  • 4,870
  • 1
  • 27
  • 21
  • 1
    Thanks that solves nicely the problem. The oql is somehow awkward to use. It all has to happen in one function... – paweloque Feb 28 '12 at 08:38
  • wow, didn't know that jvisualvm is that powerful. I found high count values for some Strings - does your code exclude garbage (not referenced Strings)? – Jan Aug 16 '12 at 20:30
  • 1
    It uses "heap.objects" to find all java.lang.String objects on the heap. There is no filtering to exclude non-referenced Strings. But depending on how the heap dump was generated the JVM may have performed a full GC before, in which case any non-referenced Strings should already have been removed and not included in the heap dump. – Johan Kaving Aug 20 '12 at 08:59
  • @JohanKaving I also tried your OQL, but it only returned ONLY one record like "string = org.netbeans.lib.profiler.heap.InstanceDump,count = 1" when I'm filtering out the java.lang.ref.Finalizer referents. I don't know how to implement the map function here. As you said, the output contains some duplicated string "*null*" It's weird that because the query should print only once and it's not a map here. – scugxl Jun 19 '17 at 10:02
9

I would use Eclipse Memory Analyzer instead.

Palesz
  • 2,104
  • 18
  • 20
  • 2
    I really like your proposal because it solves the problem very nicely. I hope, however, that you'll understand that the bounty goes to Johan Kaving for writing the oql. I think there might be situations where it is useful to understand oql. But thankts however! – paweloque Feb 28 '12 at 08:32
  • 2
    To do that use Open Query Browser -> Java Basics -> Group By Value. For objects select `java.lang.String` and for field select `value`. – kichik Jan 25 '15 at 02:25
  • Used to find duplicated string and nicely explained at https://alblue.bandlem.com/2016/03/duplicate-objects-mat.html – Mario.Cadiz Dec 09 '22 at 22:27
2

Sadly, there isn't an equivalent to "group by" in OQL. I'm assuming you're talking about the OQL that is used in jhat and VisualVM.

There is an alternative, though. If you use pure JavaScript syntax instead of the "select x from y" syntax then you have the full power of JavaScript to work with.

Even so, the alternative way of getting the information you're looking for isn't simple. For example, here's an OQL "query" that will perform the same task as your query:

var set={};
sum(map(heap.objects("java.lang.String"),function(heapString){
  if(set[heapString.toString()]){
    return 0;
  }
  else{
    set[heapString.toString()]=true;
    return 1;
  }
}));

In this example a regular JavaScript object mimics a set (collection with no duplicates). As the the map function goes through each string, the set is used to determine if the string has already been seen. Duplicates don't count toward the total (return 0) but new strings do (return 1).

Peter Dolberg
  • 2,027
  • 16
  • 21
  • Hi Peter, thanks for your query, it brings me into the direction, but I'm not yet there :) With this query I see the total number of duplicate strings. What I'd like to see is the string and repeat-number: 'foo' 10 times, 'bar' 100 times, etc.. To see that I tried to output the contents of the set, but I only get strange jscript exceptions.. Do you have an idea how to achieve what I want to see? – paweloque Feb 21 '12 at 16:26
1

A far more efficient query:

var countByValue = {};

// Scroll the strings
heap.forEachObject(
  function(strObject) {
    var key = strObject.toString();
    var count = countByValue[key];
    countByValue[key] = count ? count + 1 : 1;
  },
  "java.lang.String",
  false
);

// Transform the map into array
var mapEntries = [];
for (var i = 0, keys = Object.keys(countByValue), total = keys.length; i < total; i++) {
  mapEntries.push({
    count : countByValue[keys[i]],
    string : keys[i]
  });
}

// Sort the counts
sort(mapEntries, 'rhs.count - lhs.count');
Fabrice TIERCELIN
  • 911
  • 11
  • 11
0

Just post my solution and experience when doing similar issue for other references.

var counts = {};
var alreadyReturned = {};
top(
filter(
    sort(
        map(heap.objects("java.lang.ref.Finalizer"),
            function (fobject) {
                var className = classof(fobject.referent)
                if (!counts[className]) {
                    counts[className] = 1;
                } else {
                    counts[className] = counts[className] + 1;
                }
                return {string: className, count: counts[className]};
            }),
        'rhs.count-lhs.count'),
    function (countObject) {
        if (!alreadyReturned[countObject.string]) {
            alreadyReturned[countObject.string] = true;
            return true;
        } else {
            return false;
        }
    }),
    "rhs.count > lhs.count", 10);

The previous code will output the top 10 classes used by java.lang.ref.Finalizer.
Tips:
1. The sort function by using function XXX is NOT working on my Mac OS.
2. The classof function can return the class of the referent. (I tried to use fobject.referent.toString() -> this returned a lot of org.netbeans.lib.profiler.heap.InstanceDump. This also wasted a lot of my time).

scugxl
  • 317
  • 4
  • 15
0

Method 1

You can select all the strings and then use the terminal to aggregate them.

  1. Increase the oql limit in the visual vm config files
  2. restart visual vm
  3. oql to get all the strings
  4. copy and paste them into vim
  5. clean the data with vim macros so there's
  6. sort | uniq -c to get the counts.

Method 2

  1. Use a tool to dump all the fields object the class you're interested in ( https://github.com/josephmate/DumpHprofFields can do it )
  2. Use bash to select the strings you're interested in
  3. Use bash to aggregate
joseph
  • 2,429
  • 1
  • 22
  • 43