1

I have a Node app that sends JavaScript source code as string to a worker thread that executes it in Node's VM api. I am taking a snapshot of the worker thread's heap only. This is to detect any string allocations in the JavaScript source code. However I get a lot of obscure comments as strings that bloat the heap.

enter image description here

I originally suspected this is due to how Node VM executes code as string so I commented the VM portion of my code, but I'm still getting these unwanted strings. Perhaps this is due to using require() and import?

My code is as follows. Again, app.js simply passes source code as string to my worker thread, worker.mjs. worker.mjs will run the passed string data inside VM sandbox and then write its heap snapshot to file.

// App.js file
const { Worker, isMainThread } = require('worker_threads');

if (isMainThread) {
    // JavaScript source code passed as String.
    let workerData = `
    var nop = unescape("%u9090%u9090");
    while (nop.length <= 0x100000/2) {nop += nop;}`;

    const worker = new Worker('./worker.mjs', { workerData });

    worker.once('message', (filename) => {
      console.log(`worker heapdump: ${filename}`);
    });
  
    // Tell the worker to create a heapdump.
    worker.postMessage('heapdump');
};
// worker.mjs
import { workerData, parentPort, threadId } from 'worker_threads';
import { createContext, runInContext } from 'vm';
import { writeHeapSnapshot, getHeapSnapshot } from 'v8';

parentPort.once('message', (message) => {
    if (message === 'heapdump') {
        const sandbox = {};
        const strict = '"use strict";'

        createContext(sandbox);

        runInContext(strict+workerData, sandbox, {timeout: 10000 });

        parentPort.postMessage(writeHeapSnapshot());
    }
});

My ultimate goal is to collect all strings and concatenated strings created from only within the string source code workerData. In this example, the value of nop variable.

enter image description here

But as shown, there's so much fluff data in concatenated string as well.

"encodingOps.ucs2.byteLength"@29509
"encodingOps.utf16le.byteLength"@29531
"encodingOps.latin1.byteLength"@29555
"encodingOps.ascii.byteLength"@29579
"encodingOps.base64.byteLength"@29603
"encodingOps.hex.byteLength"@29627
"module.exports.getModuleFromWrap"@39059
...
...
"internal/modules/package_json_reader.js"@10997
"internal/modules/esm/translators.js"@11001
"internal/modules/esm/transform_source.js"@11011
"internal/modules/esm/resolve.js"@11021
"internal/modules/esm/module_map.js"@11025
"internal/modules/esm/module_job.js"@11029
"internal/modules/esm/loader.js"@11033
"internal/modules/esm/get_source.js"@11043
"internal/modules/esm/get_format.js"

The vm module enables compiling and running code within V8 Virtual Machine contexts. The vm module is not a security mechanism. Do not use it to run untrusted code. https://nodejs.org/api/vm.html

I understand that Node VM executes code in its own context. Would it be possible to retrieve the context id of the VM and then filter the heap snapshot for strings lived in that specific context? In this case, I only want nop variable. I'm hoping for some way to parse the snapshot JSON without using chrome dev-tools.

pairwiseseq
  • 313
  • 2
  • 13

1 Answers1

0

Perhaps this is due to using require() and import?

Essentially yes. You wanted all strings, you're getting all strings. JavaScript uses lots of strings. (The specific import mechanism doesn't matter. If you execute any code, you'll see its strings/etc in the heap snapshot.)

Would it be possible to retrieve the context id of the VM and then filter the heap snapshot for strings lived in that specific context?

No, there is no association between heap objects and contexts.

jmrk
  • 34,271
  • 7
  • 59
  • 74