I am downloading images from a link but I am facing some problems. It shows "found 0 links" and then "downloaded 0 files".
Here's the code:
import urllib.request
import re
import os
#the directory to where save the images
DIRECTORY = "book"
#the url to fetch the html page where the images are
URL = "https://www.inaturalist.org/taxa/56061-Alliaria-petiolata/browse_photos"
#the regex to get the url to the images from the html page
REGEX = '(?<=<a href=")http://\d.bp.inaturalist.org/[^"]+'
#the prefix of the image file name
PREFIX = 'page_'
if not os.path.isdir(DIRECTORY):
os.mkdir(DIRECTORY)
contents = urllib.request.urlopen(URL).read().decode('utf-8')
links = re.findall(REGEX, contents)
print("Found {} lnks".format(len(links)))
print("Starting download...")
page_number = 1
total = len(links)
downloaded = 0
for link in links:
filename = "{}/{}{}.jpg".format(DIRECTORY, PREFIX, page_number)
if not os.path.isfile(filename):
urllib.request.urlretrieve(link, filename)
downloaded = downloaded + 1
print("done: {} ({}/{})".format(filename, downloaded, total))
else:
downloaded = downloaded + 1
print("skip: {} ({}/{})".format(filename, downloaded, total))
page_number = page_number + 1
print("Downloaded {} files".format(total))
How can I do it?