I have a web scraping program that downloads a web page a few times every hour. On about one out of 15 or 20 attempts I get:
[Errno 10054] An existing connection was forcibly closed by the remote host
or
[Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
Is there a better approach than:
def get_page(url):
def get_page_once(url):
try:
page = opener.open(url).read()
except Exception as e:
print('Failed to download %s: %s' % (url,e))
page = ''
return page
opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0')]
page = get_page_once(url)
if page == '':
time.sleep(2)
page = get_page_once(url)
return page
I could do more than one retry, but am worried about spending too much time in this function.