0

Kind of a noob here. Not my first time webscrapping but this one gives me headaches:

Using lxml, I'm trying to scrape some data from a webpage... I managed to extract some data with other websites but I got trouble with this one.

I'm trying to get the value "44 kg CO2-eq/m2" on this website here:

https://www.bs2.ch/energierechner/#/?d=%7B%22area%22%3A%22650%22,%22floors%22%3A%224%22,%22utilization%22%3A2,%22climate%22%3A%22SMA%22,%22year%22%3A4,%22distType%22%3A2,%22dhwType%22%3A1,%22heatType%22%3A%22air%22,%22pv%22%3A0,%22measures%22%3A%7B%22walls%22%3Afalse,%22windows%22%3Afalse,%22roof%22%3Afalse,%22floor%22%3Afalse,%22wrg%22%3Afalse%7D,%22prev%22%3A%7B%22walls%22%3Afalse,%22wallsYear%22%3A1,%22windows%22%3Afalse,%22windowsYear%22%3A1,%22roof%22%3Atrue,%22roofYear%22%3A1,%22floor%22%3Afalse,%22floorYear%22%3A1%7D,%22zipcode%22%3A%228055%22%7D&s=4&i=false

import lxml.etree
from lxml import html
import requests

# Request the page
page = requests.get('https://www.bs2.ch/energierechner/#/?d=%7B%22area%22%3A%22650%22,%22floors%22%3A%224%22,%22utilization%22%3A2,%22climate%22%3A%22SMA%22,%22year%22%3A4,%22distType%22%3A2,%22dhwType%22%3A1,%22heatType%22%3A%22air%22,%22pv%22%3A0,%22measures%22%3A%7B%22walls%22%3Afalse,%22windows%22%3Afalse,%22roof%22%3Afalse,%22floor%22%3Afalse,%22wrg%22%3Afalse%7D,%22prev%22%3A%7B%22walls%22%3Afalse,%22wallsYear%22%3A1,%22windows%22%3Afalse,%22windowsYear%22%3A1,%22roof%22%3Atrue,%22roofYear%22%3A1,%22floor%22%3Afalse,%22floorYear%22%3A1%7D,%22zipcode%22%3A%228055%22%7D&s=4&i=false')
tree = html.fromstring(page.content) 

scraped_text = tree.xpath(
    '//*[@id="bs2-main"]/div/div[2]/div/div[2]/div[4]/div/div[2]/div[3]/div[2]/div[2]/div/div[2]/div[1]')
print(scraped_text)

From the print argument, i just get a blank list [] as returned value, and not the value I am looking for.

I also tried to used the long XPath, although I now that it is not optimal, because dependend of eventuell changes on the site's structure.

scraped_text = tree.xpath(
    '/html/body/div[1]/div/div[5]/main/div[3]/div/div[2]/div/div[2]/div[4]/div/div[2]/div[3]/div[2]/div[2]/div/div[2]/div[1]')
print(scraped_text)

From this XPath, I also get an empty list [] from the print argument.

I checked the correct XPath using "XPath Helper" on Chrome. I also tried to use BeautifulSoup but without any luck, as it doesn't manage XPaths.

I found a similar problem on Stackoverflow here : Empty List LXML XPATH

As it appear that my XPath is probably wrong defined. I tried since days to solve this, any help would be nice, thanks!

Edit: I also tried to get another XPath using ChroPath, but I got this feedback:

It might be child of svg/pseudo element/comment/iframe from different src. Currently ChroPath doesn't support for them.

I presume my XPath may be wrong.

Circark
  • 1
  • 1

1 Answers1

0

You can't find the element because you use requests and the requests don't load JavaScript and this page is loading by javascript.You must switch on Selenium WebDriver

mikebrucks
  • 59
  • 8