18

I found a few reference to people having a similar issue where the answer always was, make sure you call window.close() when done. However that does not seem to be working for me (node 0.8.14 and jsdom 0.3.1)

A simple repro

var util = require('util');
var jsdom=require('jsdom');

function doOne() {
  var htmlDoc = '<html><head></head><body id="' + i + '"></body></html>';
  jsdom.env(htmlDoc, null, null, function(errors, window) {
    window.close();
  });
}

for (var i=1;i< 100000;i++ )  {
  doOne();
  if(i % 500 == 0)  {
    console.log(i + ":" + util.inspect(process.memoryUsage()));
  }
}
console.log ("done");

Output I get is

500:{ rss: 108847104, heapTotal: 115979520, heapUsed: 102696768 }
1000:{ rss: 198250496, heapTotal: 194394624, heapUsed: 190892120 }
1500:{ rss: 267304960, heapTotal: 254246912, heapUsed: 223847712 }
...
11000:{ rss: 1565204480, heapTotal: 1593723904, heapUsed: 1466889432 }

At this point the fan goes wild and the test actually stops...or at leasts starts going very slowly

Does anyone have any other tips than window.close to get rid of the memory leak (or it sure looks like a memory leak)

Thanks!

Peter

Peter
  • 788
  • 7
  • 17
  • This doesn't solve the original problem, but you can also start your node process with more heap memory like: `node --max-old-space-size=8192 index.js` – zingi Sep 22 '20 at 12:55

5 Answers5

15

Using jsdom 0.6.0 to help scrape some data and ran into the same problem.
window.close only helped slow the memory leak, but it did eventually creep up till the process got killed.

Running the script with node --expose-gc myscript.js

Until they fix the memory leak, manually calling the garbage collector in addition to calling window.close seems to work:

if (process.memoryUsage().heapUsed > 200000000) { // memory use is above 200MB
    global.gc();
}

Stuck that after the call to window.close. Memory use immediately drops back to baseline (around 50MB for me) every time it gets triggered. Barely perceptible halt.

update: also consider calling global.gc() multiple times in succession rather than only once (i.e. global.gc();global.gc();global.gc();global.gc();global.gc();)

Calling window.gc() multiple times was more effective (based on my imperfect tests), I suspect because it possibly caused chrome to trigger a major GC event rather than a minor one. - https://github.com/cypress-io/cypress/issues/350#issuecomment-688969443

CheapSteaks
  • 4,821
  • 3
  • 31
  • 48
  • 2
    In my case I had to also add a 500ms sleep after `global.gc()` for it to actually release the memory: `await new Promise(resolve => setTimeout(resolve, 500));` – Klesun Sep 26 '19 at 14:27
  • tried with setImmidiate(() => global.gc()) and window.close(), nothing helps in less memory consumption. – MechaCode Jun 17 '20 at 05:17
7

You are not giving the program any idle time to do garbage collection. I believe you will run into the same problem with any large object graph created many times tightly in a loop with no breaks.

This is substantiated by CheapSteaks's answer, which manually forces the garbage collection. There can't be a memory leak in jsdom if that works, since memory leaks by definition prevent the garbage collector from collecting the leaked memory.

Domenic
  • 110,262
  • 41
  • 219
  • 271
  • 1
    The comment makes sense. That said, the code is an attempt to repro an issue where memory kept on growing and where there was ample time for gc to kick in (network server). That might just indicate these are two different problems – Peter Sep 11 '13 at 11:34
  • I thought the V8 Engine used by Node.js Garbage Collector does "Stop the World Garbage Collection" and thus should not care about giving him time. (I know this answer is old but maybe my comment can help some users with problems) – Ueffes Jan 07 '19 at 15:56
4

I had the same problem with jsdom and switcht to cheerio, which is much faster than jsdom and works even after scanning hundreds of sites. Perhaps you should try it, too. Only problem is, that it dosent have all the selectors which you can use in jsdom.

hope it works for you, too.

Daniel

BeMoreDifferent.com
  • 754
  • 1
  • 8
  • 17
  • Daniel,Thanks for your reply. I will try it out but I am not optimistic. The code is using d3 and that seems to have quite some selector stuff. – Peter Dec 17 '12 at 21:32
  • Have you found any solutions ? – Unitech Jan 11 '13 at 18:10
  • @tknew - sorry for the late reply. For now I have not unfortunately. – Peter Jan 27 '13 at 12:27
  • Tried this again on 0.5.1. Same result :( – Peter Mar 08 '13 at 20:16
  • Wow, cheerio is way faster, using way less memory too. There's no $(...).get(0) in cheerio. I had to use $(...)[0] instead. Thanks! – Henry Dec 28 '13 at 10:19
  • in my case, I have to build pages by using puppeteer and and JSDOM. But memory usage for 100 pages was like 550mb. So tried with cheerio. jsdom memory usage for building 100 pages was 550mb, but cheerio was memory usage was 1.5gb. May be JSDOM uses less memory as compared with cheerio. – MechaCode Jun 17 '20 at 05:15
  • tried with cheerio instead of JDSOM. Jsdom was using 500-550mb, for same the example, cheerio consuming 1.5gb memory. – MechaCode Jun 21 '20 at 02:50
  • Cheerio is great. I had to parse 7k documents in a loop to get some elements. For JSDOM after 200 files it used: 1 368 340 296 (~1.3 GB) and was throwing out fo memory after about 300-400. For cheerio node used 35 548 104 (35 MB) which was about my starting point. And (contrary to JSDOM) Cheerio was not freezing the memory. Memory was cropping up (to about 100 MB) and then being freed automatically (down to about 35 MB). – Nux Aug 27 '20 at 09:38
1

with gulp, memory usage, cleanup, variable delete, window.close()

var gb = setInterval(function () {

    //only call if memory use is bove 200MB
    if (process.memoryUsage().heapUsed > 200000000) { 
        global.gc();
    }

}, 10000); // 10sec


gulp.task('tester', ['clean:raw2'], function() {

  return gulp.src('./raw/*.html')
    .pipe(logger())
    .pipe(map(function(contents, filename) {


        var doc = jsdom.jsdom(contents);
        var window = doc.parentWindow;
        var $ = jquery(window);

        console.log( $('title').text() );

        var html = window.document.documentElement.outerHTML;

        $( doc ).ready(function() {
            console.log( "document loaded" );
            window.close();
        });

        return html;
    }))
    .pipe(gulp.dest('./raw2'))
    .on('end', onEnd);
});

and I had constatly between 200mb - 300mb usage, for 7k files. it took 30 minutes. It might be helpful for someone, as i googled and didnt find anything helpful.

Hontoni
  • 1,332
  • 1
  • 16
  • 27
1

A work around for this is to run the jsdom related code in a forked child_process and send back the relevant results when done. then kill the child_process.

kyle belle
  • 120
  • 1
  • 4