Scraping Xpath lxml blank/empty returned list

Question

Kind of a noob here. Not my first time webscrapping but this one gives me headaches:

Using lxml, I'm trying to scrape some data from a webpage... I managed to extract some data with other websites but I got trouble with this one.

I'm trying to get the value "44 kg CO2-eq/m2" on this website here:

https://www.bs2.ch/energierechner/#/?d=%7B%22area%22%3A%22650%22,%22floors%22%3A%224%22,%22utilization%22%3A2,%22climate%22%3A%22SMA%22,%22year%22%3A4,%22distType%22%3A2,%22dhwType%22%3A1,%22heatType%22%3A%22air%22,%22pv%22%3A0,%22measures%22%3A%7B%22walls%22%3Afalse,%22windows%22%3Afalse,%22roof%22%3Afalse,%22floor%22%3Afalse,%22wrg%22%3Afalse%7D,%22prev%22%3A%7B%22walls%22%3Afalse,%22wallsYear%22%3A1,%22windows%22%3Afalse,%22windowsYear%22%3A1,%22roof%22%3Atrue,%22roofYear%22%3A1,%22floor%22%3Afalse,%22floorYear%22%3A1%7D,%22zipcode%22%3A%228055%22%7D&s=4&i=false

import lxml.etree
from lxml import html
import requests

# Request the page
page = requests.get('https://www.bs2.ch/energierechner/#/?d=%7B%22area%22%3A%22650%22,%22floors%22%3A%224%22,%22utilization%22%3A2,%22climate%22%3A%22SMA%22,%22year%22%3A4,%22distType%22%3A2,%22dhwType%22%3A1,%22heatType%22%3A%22air%22,%22pv%22%3A0,%22measures%22%3A%7B%22walls%22%3Afalse,%22windows%22%3Afalse,%22roof%22%3Afalse,%22floor%22%3Afalse,%22wrg%22%3Afalse%7D,%22prev%22%3A%7B%22walls%22%3Afalse,%22wallsYear%22%3A1,%22windows%22%3Afalse,%22windowsYear%22%3A1,%22roof%22%3Atrue,%22roofYear%22%3A1,%22floor%22%3Afalse,%22floorYear%22%3A1%7D,%22zipcode%22%3A%228055%22%7D&s=4&i=false')
tree = html.fromstring(page.content) 

scraped_text = tree.xpath(
    '//*[@id="bs2-main"]/div/div[2]/div/div[2]/div[4]/div/div[2]/div[3]/div[2]/div[2]/div/div[2]/div[1]')
print(scraped_text)

From the print argument, i just get a blank list [] as returned value, and not the value I am looking for.

I also tried to used the long XPath, although I now that it is not optimal, because dependend of eventuell changes on the site's structure.

scraped_text = tree.xpath(
    '/html/body/div[1]/div/div[5]/main/div[3]/div/div[2]/div/div[2]/div[4]/div/div[2]/div[3]/div[2]/div[2]/div/div[2]/div[1]')
print(scraped_text)

From this XPath, I also get an empty list [] from the print argument.

I checked the correct XPath using "XPath Helper" on Chrome. I also tried to use BeautifulSoup but without any luck, as it doesn't manage XPaths.

I found a similar problem on Stackoverflow here : Empty List LXML XPATH

As it appear that my XPath is probably wrong defined. I tried since days to solve this, any help would be nice, thanks!

Edit: I also tried to get another XPath using ChroPath, but I got this feedback:

It might be child of svg/pseudo element/comment/iframe from different src. Currently ChroPath doesn't support for them.

I presume my XPath may be wrong.

score 0 · Answer 1 · answered Feb 26 '22 at 09:27

0

You can't find the element because you use requests and the requests don't load JavaScript and this page is loading by javascript.You must switch on Selenium WebDriver

answered Feb 26 '22 at 09:27

mikebrucks

59
8

OK thanks. It seams to work with Selenium WebDriver. – Circark Feb 26 '22 at 10:53
You also can try this from another question : https://stackoverflow.com/a/54056631/17774714 – mikebrucks Feb 26 '22 at 11:18
thanks for the great info: lxml with Xpath is working like a charm with r.html.render() – Circark Feb 26 '22 at 15:23

Scraping Xpath lxml blank/empty returned list

1 Answers1