I am trying to download images from hyperlinks (example). To accomplish this I am using the following function:
def download_logos(lst):
image_url = lst[1]
img_data = requests.get(image_url).content
df.append([lst[0], img_data, lst[2]])
filename = 'logos/{}/{}.png'.format(lst[2], lst[0])
os.makedirs(os.path.dirname(filename), exist_ok = True)
with open(filename, 'wb') as f:
f.write(img_data)
The variable lst
is a row in a matrix which includes the team name, the link of the image and the competition in which the team plays. When running this function for all of my data (543 teams) it seems to skip a lot images only 200-300 images downloaded.
To see if it was an issue with the script not being able to access the link and download the image data I tried to perform the action in two steps, i.e. first download the image data for all teams and then save the data to disk. To my surprise image data was present for all 543 teams, so when I tried saving the data I expected all images to be present. To my surprise, this time around 500 images were saved, which was still an improvement.
I am unable to find out what could be causing this problem, therefore I am hoping someone can point out where I made a mistake and/or how I can fix the problem.