0

So I'm fairly new to Python and I tried using this Flipkart scraper.

I tried to add a 'price' module but it keeps giving me the error 'IndexError: list index out of range'

My goal for this scraper is to scrape product info, rating, price, specs, image URL, etc from Flipkart. It is a challenging goal for me so far.... but I think I can do it if I get the right help and understand python more.

import requests
from urllib.request import urlopen as req
from bs4 import BeautifulSoup as soup

filename = "mobiles.csv"
f = open(filename, "w")
headers = "product_name, specs, rating, price\n"
f.write(headers)


for i in range(0, 200):
    url = 'https://www.flipkart.com/search?q=phones&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off'+'&page='+str(i)
    print(url)
    client = req(url)
    html = client.read()
    client.close()
    page_soup = soup(html, "html.parser")
    containers = page_soup.findAll("div",{"class":"col col-7-12"})
    for container in containers:
        
        
        price_container = container.findAll('div',  {"class":"_1vC4OE _2rQ-NK"})

        price = price_container[0].text

        name_container = container.findAll("div", {"class":"_3wU53n"})
        product_name = name_container[0].text
        
        rate_container = container.findAll("div", {"class":"hGSR34"})
        if(not(rate_container)):
            rating = "none"
        else:
            rating = rate_container[0].text

        specs_container = container.findAll("ul", {"class":"vFw0gD"})
        specs = specs_container[0].text

        f.write(product_name.replace(",", "|") + ","  +specs + "," +rating + "," +price + "\n")
f.close()

Which prints the following:

https://www.flipkart.com/search?q=phones&otracker=search&otracker1=search&marketplace=FLIPKART&as-show=on&as=off&page=0
Traceback (most recent call last):
  File "C:\Users\HOLES\Desktop\flipkart_web_scraper-master\flipkart_web_scraper-master\flipkart.py", line 24, in <module>
    price = price_container[0].text
IndexError: list index out of range

1 Answers1

0

The problem with your code is the container lies in the following code:

containers = page_soup.findAll("div",{"class":"col col-7-12"})

If you print containers[0] and searched for _1vC4OE _2rQ-NK inside it, you won't find any. So, you can fix this issue by looking to a broader <div> like this one:

containers = page_soup.findAll("div",{"class":"_1UoZlX"})
Anwarvic
  • 12,156
  • 4
  • 49
  • 69