Load any url content and follow XPATH in JS

Question

What i would like to do, is loading a page, and getting the content of something trough XPath or Selector or JS Path to then use a value got by that into my program. How could i do that ? For instance on this page, doing a request using the url of the page and following that path (while also targeting the type somehow, here it is the class) :

//*[@id="question-header"]/h1/a

Would give me 'Load any url content and follow XPATH in JS'

As i am getting the text inside this :

<a href="/questions/54847748/load-any-url-content-and-follow-xpath-in-js" class="question-hyperlink">Load any url content and follow XPATH in JS</a>

score 1 · Answer 1 · answered Feb 24 '19 at 01:04

1

Well, you could use something like

document.getElementById('question-header').children[0].children[0].href;

It's not as dynamic as XPATH (redundancy of the children), but should do the trick of you're facing a static structure. For Node.js there are several libraries that could as well do it, such as libxmljs or parse5 - more on this here.

answered Feb 24 '19 at 01:04

snwman

11
1

The main thing i am trying to do is getting the content of an URL, and then use the path. Using this, i think we're assuming that i am doing something on a page or in the console of chrome for instance. What i don't really know how to is, using node.js, log the content of a page in the console. That's the main part and then, i would like to get something using xpath or js path or anything. – Zayonx Feb 24 '19 at 01:23

score 1 · Accepted Answer · answered Feb 24 '19 at 02:02

If you need the most reliable way to get some data from a web page — i.e. including the data that can be generated by a JavaScript execution on the client side — you can use some manager of a headless browser. For example, the described task can be accomplished with Node.js and puppeteer in this script (selectors and XPath are supported as well as all the Web API via evaluation of code fragments in browser context and exchanging the data between Node.js and browser contexts):

'use strict';

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    await page.goto('https://stackoverflow.com/questions/54847748/load-any-url-content-and-follow-xpath-in-js');

    const data = await page.evaluate(() => {
      return document.querySelector('#question-header > h1 > a').innerText;
    });

    console.log(data);

    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();

Load any url content and follow XPATH in JS

2 Answers2