referring the same question in Scraped images is corrupt
In my case Trying to scrape images from site with
There are 100 images, First 67 Images saved fine as .jpg, from 68 all the images are corrupted, windows says doesn't support this file format
As per other stackoverflow question I dont have the data-src, content is shown in above image
request script is
for url in urls:
response = requests.get(url, headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36',
'Referer' : url
})
soup = BeautifulSoup(response.text, "html.parser")
image_info = []
imagePrefix = url.rsplit('/', 1)[-1:][0] + "_"
imgNo = 0
for item in soup.find_all('img'):
imageID = imagePrefix + str(imgNo)
image_info.append((item["src"], imageID))
imgNo = imgNo + 1
folder_name = url.rsplit('/', 1)[-1:][0]
df = pd.DataFrame(image_info, columns =['imageURL', 'imageID'])
df['category'] = folder_name
df_full = df_full.append(df)
parent_dir = "C:/data/images/"
path = os.path.join(parent_dir, folder_name)
os.mkdir(path)
for i in range(0, len(image_info)):
download_image(image_info[i], folder_name)
How this issue can be fixed