0

I am running puppeteer on a server on kubernetes to generate images of html pages stored in the backend, exposed as a REST api. I am initializing a browser and reusing it for every request. The requests to this microservice come from another microservice where each image generation call is using await running at regular intervals of time. This works fine for majority of the times, except, the amount of memory used by chromium keeps growing, and eventually the pod is restarted.

Here is the code of image generation

// checking if chrome is running or not
const isRunning = (query) => {
    const platform = process.platform;
    let cmd = '';
    switch (platform) {
    case 'win32': cmd = `tasklist`; break;
    case 'darwin': cmd = `ps -ax | grep ${query}`; break;
    case 'linux': cmd = `ps -A`; break;
    default: break;
    }
    return execSync(cmd).toString('ascii').toLowerCase().indexOf(query.toLowerCase()) > -1;
};

// single browser instance for reuse and avoid new spawns
async function getBrowser() {
    try {
        browser = await puppeteer.launch({ headless: true, args: ['--no-sandbox', '--disabled-setupid-sandbox', '--single-process', '--no-zygote', '--disable-gpu', '--disable-dev-shm-usage'] });
        console.log('Browser launched successfully');
        return false;
    } catch (error) {
        console.log('retrying launching chrome');
        return true;
    }
}

// wait till a browser is successfully launched to avoid timeout
async function waitTillBrowser(){
    while(await getBrowser());
}


// main code for image generation
.
.
.
if(!isRunning('chrome')){
                console.log('browser was not obtained. retrying...');
                await waitTillBrowser();
            }
            const page = await browser.newPage();
            await page.setJavaScriptEnabled(false);
            await page.setViewport({ width: CONFIG.IMAGE_PARAMS.VIEWPORT.WIDTH, height: CONFIG.IMAGE_PARAMS.VIEWPORT.HEIGHT });
            await page.setContent(data.html);
            image = await page.screenshot({type: CONFIG.IMAGE_PARAMS.ENCODING, quality:CONFIG.IMAGE_PARAMS.QUALITY});
            await page.close();
.
.
.

I didn't consider having the requests pushed to a queue and then consuming it as the image generation api isn't exposed to the user, so the amount of requests can be controlled. Also I didn't consider other libraries(like playwright) as they basically do the same thing, so I suspect I might run into the same problems there as well, similar case with puppeteer-cluster.

I am considering running a script to check if the memory consumed by chrome is above a certain limit, at which point the process will be killed. This just works for my case but isn't the right way to do this, are there any other approaches to this?

Rinkesh P
  • 588
  • 4
  • 13
  • 1
    `while(await getBrowser());` looks like it just spams a loop blasting out browsers. Not the way to await a promise! That'll just slow things down more than speed them up. Just call it one time, wait for the result and close the browser when done. You shoulldn't have to be monitoring and killing browsers--very hacky. I'm not sure if you're using Express, but see [this answer](https://stackoverflow.com/a/67910262/6243352) for an idea of how you can share a browser with multiple routes. – ggorlen Aug 03 '22 at 04:34
  • it doesn't spam browsers, it launches exactly one browser(since the requests come at a controlled rate). Tried it on several machines and the server itself. Agreed it is not the way to await a promise, but that had to be done to fix some other issues. And yes I am using express. I did not see any zombie processes, and `ps` showed only one `chrome` running at all times. Only problem is it takes upto 4000Mi of memory and then restarts the pod(this happened once over a course of 3 days). – Rinkesh P Aug 03 '22 at 04:45
  • Although this isn't a [mcve], I see that you have the `if(!isRunning('chrome')){` check, preventing lauching a new browser, so theoretically you're right, but this hack seems slow and unreliable. I wouldn't be surprised if it fails. It'd be so much easier to simply use a single promise as shown in the linked code. Anyway, the memory leak tells the truth, so even if the code here theoretically would work, the significant code smells here suggest it's probably not Puppeteer that's at fault here. Are you handling errors and closing all pages in `finally` blocks? – ggorlen Aug 03 '22 at 05:12
  • I get your point. I had come across that link before, and am initializing a global browser and sharing it while it being wrapped in a `try. . . catch`. I am not worried about how slow or fast this happens as it is a background process. I am just looking for a way to somehow reduce my memory usage. I will try to update my code with more relevant lines, but a complete reproducible example would be difficult as it involves many files. – Rinkesh P Aug 03 '22 at 05:15
  • Yes I have handled errors and closing pages in finally and catch blocks – Rinkesh P Aug 03 '22 at 05:18
  • That's good--if you're keeping your pages closed and you're guaranteeing there's one browser process, then I'm not sure what else to suggest, unfortunately. – ggorlen Aug 03 '22 at 05:28

0 Answers0