I have a Node app that sends JavaScript source code as string to a worker thread that executes it in Node's VM api. I am taking a snapshot of the worker thread's heap only. This is to detect any string allocations in the JavaScript source code. However I get a lot of obscure comments as strings that bloat the heap.
I originally suspected this is due to how Node VM executes code as string so I commented the VM portion of my code, but I'm still getting these unwanted strings. Perhaps this is due to using require()
and import
?
My code is as follows. Again, app.js
simply passes source code as string to my worker thread, worker.mjs
. worker.mjs
will run the passed string data inside VM sandbox and then write its heap snapshot to file.
// App.js file
const { Worker, isMainThread } = require('worker_threads');
if (isMainThread) {
// JavaScript source code passed as String.
let workerData = `
var nop = unescape("%u9090%u9090");
while (nop.length <= 0x100000/2) {nop += nop;}`;
const worker = new Worker('./worker.mjs', { workerData });
worker.once('message', (filename) => {
console.log(`worker heapdump: ${filename}`);
});
// Tell the worker to create a heapdump.
worker.postMessage('heapdump');
};
// worker.mjs
import { workerData, parentPort, threadId } from 'worker_threads';
import { createContext, runInContext } from 'vm';
import { writeHeapSnapshot, getHeapSnapshot } from 'v8';
parentPort.once('message', (message) => {
if (message === 'heapdump') {
const sandbox = {};
const strict = '"use strict";'
createContext(sandbox);
runInContext(strict+workerData, sandbox, {timeout: 10000 });
parentPort.postMessage(writeHeapSnapshot());
}
});
My ultimate goal is to collect all strings and concatenated strings created from only within the string source code workerData
. In this example, the value of nop
variable.
But as shown, there's so much fluff data in concatenated string as well.
"encodingOps.ucs2.byteLength"@29509
"encodingOps.utf16le.byteLength"@29531
"encodingOps.latin1.byteLength"@29555
"encodingOps.ascii.byteLength"@29579
"encodingOps.base64.byteLength"@29603
"encodingOps.hex.byteLength"@29627
"module.exports.getModuleFromWrap"@39059
...
...
"internal/modules/package_json_reader.js"@10997
"internal/modules/esm/translators.js"@11001
"internal/modules/esm/transform_source.js"@11011
"internal/modules/esm/resolve.js"@11021
"internal/modules/esm/module_map.js"@11025
"internal/modules/esm/module_job.js"@11029
"internal/modules/esm/loader.js"@11033
"internal/modules/esm/get_source.js"@11043
"internal/modules/esm/get_format.js"
The vm module enables compiling and running code within V8 Virtual Machine contexts. The vm module is not a security mechanism. Do not use it to run untrusted code. https://nodejs.org/api/vm.html
I understand that Node VM executes code in its own context. Would it be possible to retrieve the context id of the VM and then filter the heap snapshot for strings lived in that specific context? In this case, I only want nop
variable. I'm hoping for some way to parse the snapshot JSON without using chrome dev-tools
.