28

I know the common methods such as evaluate for capturing the elements in puppeteer, but I am curious why I cannot get the href attribute in a JavaScript-like approach as

const page = await browser.newPage();

await page.goto('https://www.example.com');

let links = await page.$$('a');
for (let i = 0; i < links.length; i++) {
  console.log(links[i].getAttribute('href'));
  console.log(links[i].href);
}
Googlebot
  • 15,159
  • 44
  • 133
  • 229

5 Answers5

52

await page.$$('a') returns an array with ElementHandles — these are objects with their own pupeteer-specific API, they have not usual DOM API for HTML elements or DOM nodes. So you need either retrieve attributes/properties in the browser context via page.evaluate() or use rather complicated ElementHandles API. This is an example with both ways:

'use strict';

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    await page.goto('https://example.org/');

    // way 1
    const hrefs1 = await page.evaluate(
      () => Array.from(
        document.querySelectorAll('a[href]'),
        a => a.getAttribute('href')
      )
    );

    // way 2
    const elementHandles = await page.$$('a');
    const propertyJsHandles = await Promise.all(
      elementHandles.map(handle => handle.getProperty('href'))
    );
    const hrefs2 = await Promise.all(
      propertyJsHandles.map(handle => handle.jsonValue())
    );

    console.log(hrefs1, hrefs2);

    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();
vsemozhebuty
  • 12,992
  • 1
  • 26
  • 26
  • 1
    Thanks for a clear explanation. Using page.eval() works like a charm. – subwaymatch Jul 24 '20 at 23:20
  • @vsemozhebuty Can we redirect to the inner url link after fetching it in WAY 1 ? – ABC Jul 21 '21 at 16:51
  • @Asha Sorry, I am not sure I understand. Can you elaborate? Or maybe ask a full question? – vsemozhebuty Jul 21 '21 at 16:59
  • @vsemozhebuty const hrefs1 = await page.evaluate( () => Array.from( document.querySelectorAll('a[href]'), a => a.getAttribute('href') ) ); Here , if i want to go to hrefs1 url page ..how to achieve that? if i write page.goTo(hrefs1) its throwing page is undefined.. – ABC Jul 22 '21 at 08:22
  • @Asha Unfortunately, without more code it is hard to suggest what can be wrong. Please, ask a full question with a small code example. – vsemozhebuty Jul 22 '21 at 10:56
13
const yourHref = await page.$eval('selector', anchor => anchor.getAttribute('href'));

but if are working with a handle you can

const handle = await page.$('selector');
const yourHref = await page.evaluate(anchor => anchor.getAttribute('href'), handle);
Ekeuwei
  • 273
  • 3
  • 7
9

I don't know why it's such a pain, but this was found when I encountered this a while ago.

async function getHrefs(page, selector) {
  return await page.$$eval(selector, anchors => [].map.call(anchors, a => a.href));
}
Phix
  • 9,364
  • 4
  • 35
  • 62
3

A Type safe way of returning an array of strings as the hrefs of the links by casting using the HTMLLinkElement generic for TypeScript users:

await page.$$eval('a', (anchors) => anchors.map((link) => (link as HTMLLinkElement).href));
Dan Barclay
  • 5,827
  • 2
  • 19
  • 22
0

A simple way to get an href from an anchor element

Say you fetched an anchor element with the following

const anchorElement = await page.$('a') // or page.$<HTMLAnchorElement>('a') if using typescript

You can get the href property with the following

const href = anchorElement.evaluate(element => element.href)
Ulad Kasach
  • 11,558
  • 11
  • 61
  • 87