Kind of a noob here. Not my first time webscrapping but this one gives me headaches:
Using lxml, I'm trying to scrape some data from a webpage... I managed to extract some data with other websites but I got trouble with this one.
I'm trying to get the value "44 kg CO2-eq/m2" on this website here:
import lxml.etree
from lxml import html
import requests
# Request the page
page = requests.get('https://www.bs2.ch/energierechner/#/?d=%7B%22area%22%3A%22650%22,%22floors%22%3A%224%22,%22utilization%22%3A2,%22climate%22%3A%22SMA%22,%22year%22%3A4,%22distType%22%3A2,%22dhwType%22%3A1,%22heatType%22%3A%22air%22,%22pv%22%3A0,%22measures%22%3A%7B%22walls%22%3Afalse,%22windows%22%3Afalse,%22roof%22%3Afalse,%22floor%22%3Afalse,%22wrg%22%3Afalse%7D,%22prev%22%3A%7B%22walls%22%3Afalse,%22wallsYear%22%3A1,%22windows%22%3Afalse,%22windowsYear%22%3A1,%22roof%22%3Atrue,%22roofYear%22%3A1,%22floor%22%3Afalse,%22floorYear%22%3A1%7D,%22zipcode%22%3A%228055%22%7D&s=4&i=false')
tree = html.fromstring(page.content)
scraped_text = tree.xpath(
'//*[@id="bs2-main"]/div/div[2]/div/div[2]/div[4]/div/div[2]/div[3]/div[2]/div[2]/div/div[2]/div[1]')
print(scraped_text)
From the print argument, i just get a blank list [] as returned value, and not the value I am looking for.
I also tried to used the long XPath, although I now that it is not optimal, because dependend of eventuell changes on the site's structure.
scraped_text = tree.xpath(
'/html/body/div[1]/div/div[5]/main/div[3]/div/div[2]/div/div[2]/div[4]/div/div[2]/div[3]/div[2]/div[2]/div/div[2]/div[1]')
print(scraped_text)
From this XPath, I also get an empty list [] from the print argument.
I checked the correct XPath using "XPath Helper" on Chrome. I also tried to use BeautifulSoup but without any luck, as it doesn't manage XPaths.
I found a similar problem on Stackoverflow here : Empty List LXML XPATH
As it appear that my XPath is probably wrong defined. I tried since days to solve this, any help would be nice, thanks!
Edit: I also tried to get another XPath using ChroPath, but I got this feedback:
It might be child of svg/pseudo element/comment/iframe from different src. Currently ChroPath doesn't support for them.
I presume my XPath may be wrong.