3

My task is to wind the number of people online on the website using a bot. Conditions: 1. The bot should go to the site and stay on this page as long as possible (do not break the connection) 2. The site can use both - WebSockets or long polling to check the connection (ie, javascript should be supported)

I have a solution with headless browser(puppeteer) + node.js

const puppeteer = require('puppeteer');

async function runBot(botsCount = 10, secondsToWait = 60, interval = 1000) {
    let time = secondsToWait * 1000;
    console.log(`Starting chrome...`);
    const browser = await puppeteer.launch({
        args: [
            '--disable-gpu',
            '--no-sandbox',
            '--headless',
            '--disable-web-security',
            '--disable-dev-profile',
            '--disable-dev-shm-usage',
        ]
    })
    for (let i = 1; i <= botsCount; i++) {
        const page = await browser.newPage();
        await page.goto('https://www.example.page/');
        console.log(`Page ${i} created`);
    }
    console.log(`Awaiting for finish...`);
    const savedInterval = setInterval(() => {
        process.stdout.write("\rTime Left:" + (time / 1000) + "       ");
        time -= interval;
        if(time === 0) {
            clearInterval(savedInterval);
            browser.close();
            console.log(`\nFinished`);
        }
    }, interval);
}

runBot();

But this is not a very good solution since each browser window uses from 60MB to 120MB of RAM. It's very expensive...

Perhaps someone has come across this and knows some solutions, how to do it more efficiently?

Any help appreciated

ArtemSky
  • 1,173
  • 11
  • 20
  • *Why* do you want to have virtual users on your website? Depending on the purpose, different solutions will apply. Why is this question tagged `php` and `python`, when you are using neither here? – phihag Jan 24 '18 at 12:37
  • @phihag You are right. I removed it. – ArtemSky Jan 24 '18 at 12:43
  • This is similar to cheating people online for streaming service like twitch – ArtemSky Jan 24 '18 at 12:45
  • Surely, 120 MB of ram is not that expensive? If you want to avoid rendering whole page (which you are doing now) you could inspect the page and figure out how it tracks the online users, then just simulate that part ? – dkasipovic Feb 02 '18 at 14:31

1 Answers1

0

setRequestInterception API will help to decrease memory consumption. Based on your use case, you may not need image, font, stylesheet to track the online users.

The detailed API can be found here

https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetrequestinterceptionvalue

I did a bench mark in my server to access google using your code. It will assume around 90 MB RAM in average.

Benchmark before request interception

After I implement a request interception as below, it decreases 10 MB RAM usage for every thread.

await page.setRequestInterception(true);
page.on('request', (request) => {
    if (['image', 'stylesheet', 'font', 'script'].indexOf(request.resourceType()) !== -1) {
        request.abort();
    } else {
        request.continue();
    }
 });

 await page.goto('https://www.google.com/');

Benchmark after request interception

Hope it helps

yue you
  • 2,206
  • 1
  • 12
  • 29