2

So, I am using Puppeteer (a headless browser) to scrape through a website, and when I access that url, how can I load jQuery to use it inside my page.evaluate() function.

All I have now is a .js file and I'm running the code below. It goes to my URL as intended until I get an error on page.evaluate() since it seems like it's not loading the jQuery as I thought it would from the code on line 7: await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'})

Any ideas how I can load jQuery correctly here, so that I can use jQuery inside my page.evaluate() function?

(async() => {
  let url = "[website url I'm scraping]"
  let browser = await puppeteer.launch({headless:false});
  let page = await browser.newPage();
  await page.goto(url, {waitUntil: 'networkidle2'});
  // code below doesn't seem to load jQuery, since I get an error in page.evaluate()
  await page.addScriptTag({url: 'https://code.jquery.com/jquery-3.2.1.min.js'})
  await page.evaluate( () => {
      // want to use jQuery here to do access DOM
      var classes = $( "td:contains('Lec')")
      classes = classes.not('.Comments')
      classes = classes.not('.Pct100')
      classes = Array.from(classes)
  });
})();
Daniel-G
  • 39
  • 2
  • 7

2 Answers2

3

You are on the right path.

Also I don't see any jQuery code being used in your evaluate function. There is no document.getElement function.

The best way would to be to add a local copy of jQuery to avoid any cross origin errors.

More details can be found in the already answered question here.

UPDATE: I tried a small snippet to test jquery. The puppeteer version is 10.4.0.

(async () => {
    const browser = await puppeteer.launch({headless:false});
    const page = await browser.newPage();
    await page.goto('https://google.com',{waitUntil: 'networkidle2'});
    await page.addScriptTag({path: "jquery.js"})
    await page.evaluate( () => {
        let wrapper = $(".L3eUgb");
        wrapper.css("background-color","red");
    }) 
    await page.screenshot({path:"hello.png"});
    await browser.close();
})();

The screenshot is

puppeteer image

So the jquery code is definitely working.

Also check if the host website doesn't have a jQuery instance already. In that case you would need to use jquery noConflict

$.noConflict();
vsvanshi
  • 71
  • 2
  • 6
  • Oops, I added the wrong code, just updated it with jQuery. I'll try adding a local copy and see if that works... – Daniel-G Oct 26 '21 at 06:36
  • I tried using ``` await page.addScriptTag({path: require.resolve('jquery')}) ``` but I got an error running that line for some reason. I have 'jquery.js' in the same working directory as my main .js file – Daniel-G Oct 26 '21 at 06:44
  • update: I tried await page.addScriptTag({path: 'jquery.js'}) and it worked but now I'm still getting an error at page.evaluate() for some reason. Is my jQuery not valid? The code seems to work in my local browser so not sure why it's not working here. – Daniel-G Oct 26 '21 at 06:53
  • @Daniel-G I just tested with a small snippet. Please see the updated answer – vsvanshi Oct 26 '21 at 09:44
  • The link I'm scraping is https://www.reg.uci.edu/perl/WebSoc Does it work on your end when you run my code snippet? – Daniel-G Oct 26 '21 at 09:59
0

Fixed it!

I realized I forgot to include the code where I did some extra navigation clicks after going to my initial URL, so the problem was from adding the script tag to my initial URL instead of after navigating to my final destination URL.

I also needed to use

await page.waitForNavigation({waitUntil: 'networkidle2'})

before adding the script tag so that the page was fully loaded before adding the script.

Daniel-G
  • 39
  • 2
  • 7