I have a python list containing these five weblinks:
https://en-ae.namshi.com/brands/buy-a-little-lovely-company-little-angel-light-w1030269a.html
https://en-ae.namshi.com/brands/buy-a-little-lovely-company-rose-bud-teething-ring-w1030273a.html
https://en-ae.namshi.com/brands/buy-a-little-lovely-company-cloud-projector-light-w869154a.html
https://en-ae.namshi.com/brands/buy-a-little-lovely-company-bunny-projector-light-w869153a.html
https://en-ae.namshi.com/brands/buy-a-little-lovely-company-little-fairy-light-w1030270a.html
I am trying to loop through the links to extract certain elements from the page , the extraction works fine for most of the elements but I am unable to get the "RATING" and the "NUM_REVIEWS" from the webpage to fill in the column. Can someone please help me in getting these. Thanks
Working code:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from lxml import html
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.82 Safari/537.36'}
list_urls = [
'https://en-ae.namshi.com/brands/buy-a-little-lovely-company-little-angel-light-w1030269a.html',
'https://en-ae.namshi.com/buy-american-eagle-straight-dark-wash-jeans-w925887a.html',
'https://en-ae.namshi.com/brands/buy-a-little-lovely-company-rose-bud-teething-ring-w1030273a.html',
'https://en-ae.namshi.com/brands/buy-a-little-lovely-company-cloud-projector-light-w869154a.html',
]
all_data = []
for lnk in list_urls:
page=requests.get(lnk)
tree = html.fromstring(page.content)
f = requests.get(lnk, headers=headers).text
hun = BeautifulSoup(f,'html.parser')
product_name=hun.find("h1",{"class":"product__name"}).text.replace('\n',"")
brand_name = hun.find("h2",{"class":"product__brandname"}).text.replace('\n',"")
price = str(tree.xpath('//*[@id="content"]/div/div[3]/section[1]/div/div[1]/div/div[2]/header/div/p[1]/span[1]/text()')[0])
reduced_price = str(tree.xpath('//*[@id="content"]/div/div[3]/section[1]/div/div[1]/div/div[2]/header/div/p[1]/span[2]/text()')[0])
rating = str(tree.xpath('/html/body/div[1]/div[7]/div/div[3]/section[1]/div/div[2]/div/div[1]/text()'))
num_reviews = str(tree.xpath('/html/body/div[1]/div[7]/div/div[3]/section[1]/div/div[2]/div/div[1]/div[1]/span[2]/text()'))
sub_cat_1 = str(tree.xpath('//*[@id="content"]/div/div[3]/ul/li[3]/a/text()')[0])
sub_cat_2 = str(tree.xpath('//*[@id="content"]/div/div[3]/ul/li[4]/a/text()')[0])
sub_cat_3 = str(tree.xpath('//*[@id="content"]/div/div[3]/ul/li[5]/a/text()')[0])
row = {"Product_Name": product_name, "Brand_Name" : brand_name, "Original_Price" : price,
"Discounted_Price" : reduced_price ,"Rating" : rating, "Num_Reviews" : num_reviews,
"Sub_cat_1" : sub_cat_1, "Sub_cat_2" : sub_cat_2, "Sub_cat_3" : sub_cat_3}
all_data.append(row)
df = pd.DataFrame(all_data)
print(df.head(5))
Can you please help me in getting the ratings and reviews as well , Thanks in advance.
Expected values in both cols :