I'm trying to parse an amazon product. half of the times I run the code it works well and returns the information, the other half my requests gets redirected to an amazon page which seems to have been designed to combat malicious requests. When I try to return the url of the page it returns my original input url, not the one of the amazon page. From what i've read using headers should solve this problem, but again, it only does for about half of the requests, which is really quite strange. Is there anyway to ensure i always get a real response?
Below is the code:
import requests
from bs4 import BeautifulSoup as soup
#constants
url = "https://www.amazon.com/Zephyrus-GeForce-i7-9750H-Windows-GX531GW-
AB76/dp/B07QN3683G/ref=sr_1_12?dchild=1&keywords=zephyrus+g15&qid=1586732721&sr=8-12"
#Amazon data class
class items:
def __init__(self, url):
self.url = url
#parses page and returns info
def data(self):
headers = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.163 Safari/537.36"}
#get html response
try:
html = requests.get(self.url, headers=headers).content
except Exception:
print("Could not retrieve page")
else:
#parse the page
pagesoup = soup(html, "html5lib")
#get price, name
try:
price = pagesoup.find("span", id="priceblock_ourprice").get_text().strip()
except Exception:
print("Price could not be extracted")
price = None
try:
name = pagesoup.find("span", id="productTitle").get_text().strip()
except Exception:
print("Product name could not be extracted")
name = None
return price, name
#test
item_1 = items(url)
print(item_1.data())