I’m a Python newbie, so apologies if this is a very dumb question, but I’ve spent a lot of time trying to answer it myself without success. I’m using the following script to download xml files from a website using urllib2:
import os
os.chdir('C:\Users\AB\Documents')
import urllib2
site= "http://www.example.com/ab/cdef/1324"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',}
req = urllib2.Request(site, headers=hdr)
try:
page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
print e.fp.read()
content = page.read()
filename = "1324.xml"
file_ = open(filename, 'w')
file_.write(content)
file_.close()
What I want to do is download a series of xml files from the same site using one script. The url sequence couldn’t be simpler – the number after the “http://www.example.com/ab/cdef/” just goes up by one each time, so the next page to be downloaded would be “http://www.example.com/ab/cdef/1325” and the resulting file would be called “1325.xml”.
I’ve tried many different FOR loops, without any success. How can I go through a series of webpages – say “…/cdef/1324” through to “…/cdef/1340” and download a different (and differently named) xml file (“1324.xml” through to “1340.xml” in this example) for each one please?