0

I am using the below code to batch download a text list of json files from the web. The links aren't standardized and can be https or http and can end with '.json' or not.

def save_json(url):
  import os
  filename = url.replace('/','').replace(':','') .replace('.','|').replace('|json','.json').replace('|JSON','.json').replace('|','').replace('?','').replace('=','').replace('&','')
  path = "U:/location/json"
  fullpath = os.path.join(path, filename)
  import urllib2
  response = urllib2.urlopen(url)
  webContent = response.read()
  f = open(fullpath, 'w')
  f.write(webContent)
  f.close()


f = open('U:/location/index_dl.txt')
p = f.read()
url_list = p.split('\n') #here's where \n is the line break delimiter that can be changed
for url in url_list:
  save_json(url)

Every so often I get the error:

Errno 10054 An existing connection was forcibly closed by the remote host.

enter image description here

Question: Does anyone know of another way to batch download a list of json links from the web, or have a way to handle this error as it happens?

Thanks in advance! SJB

DimaSan
  • 12,264
  • 11
  • 65
  • 75
samtana
  • 11
  • 1
  • print url on screen and try it in browser - maybe you create incorect url. Or maybe you need some HTTP headers (like `user-agent`) because server checks it. – furas Oct 01 '16 at 18:19
  • I think urllib2 is attempting another request over the connection that was reset on the server. Sending HTTP header "Connection: Keep-Alive" should help. Check this question for how to do this with urllib2 - http://stackoverflow.com/questions/385262/how-do-i-send-a-custom-header-with-urllib2-in-a-http-request – AnishT Oct 01 '16 at 18:31
  • You need to provide a *fake* user agent to a web server to deceive it. – Jeon Oct 01 '16 at 19:43

0 Answers0