I'm trying to scrape an ecommerce store but getting Attribute error: nonetype object has no attribute get_text. This happens whenever i try to iterate between each products through the product link. I'm confused if am running into a javascript or captcha or whatnot don't know. Here's my code
import requests
from bs4 import BeautifulSoup
baseurl = 'https://www.jumia.com'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36'
}
productlinks = []
for x in range(1,51):
r = requests.get(f'https://www.jumia.com.ng/ios-phones/?page={x}#catalog-listing/')
soup = BeautifulSoup(r.content, 'lxml')
productlist = soup.find_all('article', class_='prd _fb col c-prd')
for product in productlist:
for link in product.find_all('a', href=True):
productlinks.append(baseurl + link['href'])
for link in productlinks:
r = requests.get(link, headers = headers)
soup = BeautifulSoup(r.content, 'lxml')
name = soup.find('h1', class_='-fs20 -pts -pbxs').get_text(strip=True)
amount = soup.find('span', class_='-b -ltr -tal -fs24').get_text(strip=True)
review = soup.find('div', class_='stars _s _al').get_text(strip=True)
rating = soup.find('a', class_='-plxs _more').get_text(strip=True)
features = soup.find_all('li', attrs={'style': 'box-sizing: border-box; padding: 0px; margin: 0px;'})
a = features[0].get_text(strip=True)
b = features[1].get_text(strip=True)
c = features[2].get_text(strip=True)
d = features[3].get_text(strip=True)
e = features[4].get_text(strip=True)
f = features[5].get_text(strip=True)
print(f"Name: {name}")
print(f"Amount: {amount}")
print(f"Review: {review}")
print(f"Rating: {rating}")
print('Key Features')
print(f"a: {a}")
print(f"b: {b}")
print(f"c: {c}")
print(f"d: {d}")
print(f"e: {e}")
print(f"f: {f}")
print('')
Here's the error message:
Traceback (most recent call last):
File "c:\Users\LP\Documents\jumia\jumia.py", line 32, in <module>
name = soup.find('h1', class_='-fs20 -pts -pbxs').get_text(strip=True)
AttributeError: 'NoneType' object has no attribute 'get_text'
PS C:\Users\LP\Documents\jumia> here
(heading) element with a class of "-fs20 -pts -pbxs" then.
– Nathan Mills Oct 24 '22 at 05:00with it's class attribute "-fs20 -pts -pbxs". It's there for sure. @Nathan Mills
– Miracle Oct 24 '22 at 05:10on that page in Edge Devtools, the only result is `
– Nathan Mills Oct 24 '22 at 05:30iOS Phones
`. I tried searching for the same tag in Firefox Devtools but its search is bad. Does your \script still give an error if you change line 32 to `name = soup.find('h1', class_='-fs20 -m -elli -phs').get_text(strip=True)`?IPhone X 3GB RAM+64GB(Renewed) -Black
==$0. Any idea what it means @Nathan Mills – Miracle Oct 24 '22 at 08:04with `-fs20 -pts -pbxs` class now. Are you sure the `soup` variable contains the HTML from the right page at line 32? Perhaps the indentation of your code is incorrect, which can cause `soup` to be a different variable than you expect, since you seem to have multiple `soup` variables. About the `==$0`, that's just something Chrome adds to the element you select, see https://stackoverflow.com/questions/36999739/what-does-0-double-equals-dollar-zero-mean-in-chrome-developer-tools
– Nathan Mills Oct 24 '22 at 22:53