How can I check if either xpath exists and then return the value if text is present?

Question

I'm having trouble with the second r.html.xpath request. When there is a special deal on an item, the second Xpath changes from

//*[@id="priceblock_ourprice"]

to

//*[@id="priceblock_dealprice"]

This causes the script to fail since there the right xpath cannot be returned. How can I include this second xpath that only shows up occasionally? I would like to see if either xpath exists, if so return that, or return N/A. The first url that is searched has the ourprice xpath and the second url has the dealprice xpath. What am I missing here?

from requests_html import HTMLSession
import pandas as pd

urls = ['http://amazon.com/dp/B01KZ6V00W',
'http://amazon.com/dp/B089FBPFHS'
          ]

def getPrice(url):
    s = HTMLSession()
    r = s.get(url)
    r.html.render(sleep=1,timeout=20)
    product = {
        'title': str(r.html.xpath('//*[@id="productTitle"]', first=True).text),
        'price': str(r.html.xpath('//*[@id="priceblock_ourprice"]', first=True).text),
        'details': str(r.html.xpath('//*[@id="detailBulletsWrapper_feature_div"]', first=True).text)
    }
    res = {}
    for key in list(product):
        res[key] = product[key].replace('\n',' ')

    print(res)
    return res

prices = []
for url in urls:
    prices.append(getPrice(url))


df = pd.DataFrame(prices)
print(df.head(15))
df.to_csv("testfile.csv",index=False)
print(len(prices))

traceback

  'price': str(r.html.xpath('//*[@id="priceblock_ourprice"]', first=True).text),
AttributeError: 'NoneType' object has no attribute 'text'

Check if this question helps you : https://stackoverflow.com/questions/3737906/xpath-how-to-check-if-an-attribute-exists — Cagri, Dec 15 '20 at 21:13

score 2 · Answer 1 · answered Dec 15 '20 at 21:19

if r.html.xpath('//*[boolean(@id="priceblock_ourprice"):
    productprice = str(r.html.xpath('//*[boolean(@id="priceblock_ourprice")]', first=True).text)
elif r.html.xpath('//*[boolean(@id="priceblock_dealprice"):
    productprice = str(r.html.xpath('//*[boolean(@id="priceblock_dealprice")]', first=True).text)      

product = {
        'title': str(r.html.xpath('//*[@id="productTitle"]', first=True).text),
        'price': productprice,
        'details': str(r.html.xpath('//*[@id="detailBulletsWrapper_feature_div"]', first=True).text)
    }

Something like that. I am not exactly sure if the syntax is totally correct.

score 1 · Accepted Answer · edited Nov 13 '21 at 11:17

Why don't you use the try and except command to check if the value exists. You get the error because the value you are trying to get has no text in it.

I haven't got requests_html, but I will show the code using the selenium module.

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from time import sleep, strftime
import pandas as pd

urls = ['http://amazon.com/dp/B01KZ6V00W', 'http://amazon.com/dp/B089FBPFHS']

webdriver = webdriver.Chrome()
old_price = ""


def getPrice(url):
    global old_price
    global webdriver

    webdriver.get(url)

    sleep(5)

    title = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[4]/div[1]/div/h1/span").text

    try:
        old_price = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[4]/div[10]/div[1]/div/table/tbody/tr[1]/td[2]/span[1]").text
        price = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[1]/div[5]/div/div/div/div/div/form/div/div/div/div/div[1]/div/span[1]").text
        if old_price[1:] == price[1:]:
            deal_type = "normal"
        else:
            deal_type = "deal"
    
    except:
        price = webdriver.find_element_by_xpath("/html/body/div[2]/div[2]/div[7]/div[5]/div[1]/div[5]/div/div/div/div/div/form/div/div/div/div/div[1]/div/span[1]").text
        deal_type = "normal"
    
    print(old_price)
    print(title)
    print(price)
    print(deal_type)

    return price

prices = []

for url in urls:
    prices.append(getPrice(url))

print(prices)

df = pd.DataFrame(prices)
print(df.head(15))
df.to_csv("testfile.csv",index=False)
print(len(prices))

Let me explain:

The first 4 lines import the necessary modules such as selenium and pandas. The next line saves the URLs. After, webdriver = webdriver.Chrome() sets the brower to chrome.

After, in getPrice, we open the url using webdriver.get(url).

Then, we get the title from the xpath variable.

The try command checks to see if the xpath which shows the deal exists. if it does, it gets the old and new price, and saves the product as a deal. If the xpath for a deal does NOT exist, it moves onto the except and saves the prodcut as a normal one.

It then prints the price, title and deal type.

Finally, it runs the function for every URL, and saves it to a CSV file.

I explained the code so that you could turn it into requests_html.

Can you provide an example where the command would be placed? — mjbaybay7, Dec 15 '20 at 21:27
Hi, your code does not run. I get this error: `selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"/html/body/div[2]/div[2]/div[7]/div[5]/div[4]/div[1]/div/h1/span"}` — mjbaybay7, Dec 16 '20 at 01:23
Perhaps the xpath is wrong. Go to the website (eg. amazon) Inspect the title and left click the element in the inspect area. Click copy full xpath. Paste the xpath into th title variable in python. Hope it helps! — The Pilot Dude, Dec 16 '20 at 16:07

How can I check if either xpath exists and then return the value if text is present?

2 Answers2

Linked