0

referring the same question in Scraped images is corrupt

In my case Trying to scrape images from site with

enter image description here

There are 100 images, First 67 Images saved fine as .jpg, from 68 all the images are corrupted, windows says doesn't support this file format

As per other stackoverflow question I dont have the data-src, content is shown in above image

request script is

for url in urls:
    response = requests.get(url, headers={
        'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36',
        'Referer' : url
    })
    soup = BeautifulSoup(response.text, "html.parser")
    image_info = []
    imagePrefix = url.rsplit('/', 1)[-1:][0] + "_"
    imgNo = 0
    for item in soup.find_all('img'):
      imageID = imagePrefix + str(imgNo)   
      image_info.append((item["src"], imageID))
      imgNo = imgNo + 1
    folder_name = url.rsplit('/', 1)[-1:][0]
    df = pd.DataFrame(image_info, columns =['imageURL', 'imageID'])
    df['category'] = folder_name
    df_full = df_full.append(df)
    parent_dir = "C:/data/images/"
    path = os.path.join(parent_dir, folder_name)
    os.mkdir(path)
    for i in range(0, len(image_info)):
        download_image(image_info[i], folder_name)

How this issue can be fixed

hanzgs
  • 1,498
  • 17
  • 44

0 Answers0