1

I'm running some Node.js code to scrape a website and return some text from this part of the html: screenshot of div container in chrome devtools

And here's the code I'm using to get it

const fs = require('mz/fs');
const xpath = require('xpath');
const parse5 = require('parse5');
const xmlser = require('xmlserializer');
const dom = require('xmldom').DOMParser;
const axios = require('axios');

(async () => {
    const response = await axios.get('https://www.aritzia.com/en/product/sculpt-knit-tank-%28arjun-knit-top%29/66139.html?dwvar_66139_color=17388');
    const html = response.data;
    const document = parse5.parse(html.toString());
    const xhtml = xmlser.serializeToString(document);
    const doc = new dom().parseFromString(xhtml);
    const select = xpath.useNamespaces({"x": "http://www.w3.org/1999/xhtml"});
    const nodes = select("//x:div[contains(@class, 'pdp-product-brand')]/*/text()", doc);
    console.log(nodes.length ? nodes[0].nodeValue : nodes.length)
})();

The code above works as expected -- it prints Babaton.

But when I swap out the xpath above for one that includes a instead of * (i.e. //x:div[contains(@class, 'pdp-product-brand')]/a/text()) it instead tells me that nodes.length === 0.

I would expect it to give the same result because the div that it's pointing to does in fact have a child anchor tag (see screenshot above). I'm just confused why it doesn't work with a and was wondering if anybody else knew the answer. Thanks!

David McNamee
  • 403
  • 4
  • 10

0 Answers0