15

I'm trying to understand the contents of a heapdump generated by google chrome tools. I understand that there is already a in-browser heap dump inspector but I'm interested in writing a CLI that parses a JS heap dump as an exercise. I'm not able to find any docs on the structure of the contents inside of a heap dump. They're human readable but the format isn't very clear from inspecting the file

Here's a random snippet:

"HTMLOptionElement",
"XMLHttpRequestEventTarget",
"about:blank",
"clearModifier",
"resetModifiers",
"/devtools/docs/demos/memory/example1",
"HTMLIFrameElement",
"https://www.google.com/jsapi?autoload=%7B%22modules%22%3A%5B%7B%22name%22%3A%22search%22%2C%22version%22%3A%221.0%22%2C%22callback%22%3A%22__gcse.scb%22%2C%22style%22%3A%22https%3A%2F%2Fwww.google.com%2Fcse%2Fstyle%2Flook%2Fv2%2Fdefault.css%22%2C%22language%22%3A%22en%22%7D%5D%7D",
"HTMLLinkElement",
"HTMLContentElement",
"window.__SSR = {c: 1.2808007E7 ,si:1,su:1,e:'richard@example.com',dn:'Richard Schneeman',a:'bubble',at:'AZW7SXV+1uUcQX+2WIzyelLB5UgBepsr1\\/RV+URJxwIT6BmLmrrThMH0ckzB7mLeFn1SFRtxm\\/1SD16uNnjb0qZxXct8\\x3d',ld:[,[0,12808007,[]\n,1,70]\n]\n,r:'https:\\/\\/developer.chrome.com\\/devtools\\/docs\\/demos\\/memory\\/example1',s:'widget',annd: 2.0 ,bp: {}, id:'http:\\/\\/www.google.com\\/chrome'}; document.addEventListener && document.addEventListener('DOMContentLoaded', function () {gapi.inline.tick('wdc', new Date().getTime());}, false);",
"onLoaded",
"HTMLAllCollection",
"onDocumentKeyDown",

Do docs on the structure of chrome heap dumps exist? Is there a standard javascript heap dump format or does every engine have their own proprietary standard?

Schneems
  • 14,918
  • 9
  • 57
  • 84
  • 1
    Unfortunately there's no such thing as standard JS heap format. Quick googling for "v8 heap dump format" gives several results, none of them are super-detailed. There's node.js extension: https://www.npmjs.com/package/heapsnapshot-parser, and v8 source code contains the most up-to-date info: https://github.com/v8/v8/blob/master/include/v8-profiler.h – smirnoff Nov 20 '15 at 02:27
  • Thanks, I realized after posting that looking at the source was an option. I'm new to the project, appreciate the links. – Schneems Nov 20 '15 at 17:40
  • @smirnoff you should make this into an answer – jberryman Sep 15 '20 at 21:10
  • 1
    So, at the end of the day we don't know what is the meaning of data inside a v8 heapsnapshot ? – Elia Apr 08 '21 at 09:13

2 Answers2

0

Unfortunately there's no such thing as standard JS heap format. Quick googling for "v8 heap dump format" gives several results, none of them are super-detailed. There's node.js heapsnapshot parser extension, and v8 source code contains the most up-to-date info: v8-profiler.h

smirnoff
  • 420
  • 6
  • 21
  • Microsoft published a really nice overview of the V8 heap snapshot JSON format: https://learn.microsoft.com/en-us/microsoft-edge/devtools-guide-chromium/memory-problems/heap-snapshot-schema – Joe Jul 30 '23 at 21:03
0

After spending a couple of days writing a parser in Go for the V8 heapsnapshot JSON file, here's what I've learned:

  • The heapsnapshot file is a JSON file that represents all heap-allocated values and the edges between them.

  • The JSON format uses dictionary-encoding to compactly represent the nodes and edges of the graph.

The annotated top-level structure of a heapsnapshot file looks like:

{
    "snapshot": {
        "meta": {
            "node_fields": ["type", "name", "id", "self_size", "edge_count", "trace_node_id", "detachedness"],
            "node_types": [
                ["hidden", "array", "string", "object", "code", "closure", "regexp", "number", "native", "synthetic", "concatenated string", "sliced string", "symbol", "bigint", "object shape"],
                "string",
                "number",
                "number",
                "number",
                "number",
                "number"
            ],
            "edge_fields": ["type", "name_or_index", "to_node"],
            "edge_types": [
                ["context", "element", "property", "internal", "hidden", "shortcut", "weak"],
                "string_or_number",
                "node"
            ]
        }
    },
    "nodes": [
      // Each node is represented by 7 numbers, matching the length of node_fields.
      // Each number corresponds to the index of the value in the node_types array.
      1,0,66,16,10,0,0, // type=array  name=alpha id=66 self_size=16 edge_count=10 trace_node_id=0 detachedness=0
      2,1,77,16, 1,0,0  // type=string name=bravo id=77 self_size=16 edge_count=1  trace_node_id=0 detachedness=0
    ],
    "edges": [
      // Each edge is represented by 3 numbers, matching the length of edge_fields.
      1,1,0 // type=element name_or_index=1(bravo) to_node=0
    ],
    "trace_function_infos": [],
    "trace_tree": [],
    "samples": [],
    "locations": [],
    "strings": [
      "alpha",
      "bravo",
      "charlie"
    ]
}

Other notes:

  • To compare different heap snapshots, use the nodes.id. The ID is consistent across snapshots, meaning the same object will have the same ID in different snapshots.
  • The Chromium HeapSnapshotLoader uses a custom parser for the nodes and edges array since it's always an array of uint32 and heap snapshots may be very large (GiB).

References:

Joe
  • 3,370
  • 4
  • 33
  • 56