Python urllib2 query – download separate xml file for each webpage in series of urls

Question

I’m a Python newbie, so apologies if this is a very dumb question, but I’ve spent a lot of time trying to answer it myself without success. I’m using the following script to download xml files from a website using urllib2:

import os
os.chdir('C:\Users\AB\Documents')
import urllib2
site= "http://www.example.com/ab/cdef/1324"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',}
req = urllib2.Request(site, headers=hdr)
try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()
content = page.read()
filename = "1324.xml"
file_ = open(filename, 'w')
file_.write(content)
file_.close()

What I want to do is download a series of xml files from the same site using one script. The url sequence couldn’t be simpler – the number after the “http://www.example.com/ab/cdef/” just goes up by one each time, so the next page to be downloaded would be “http://www.example.com/ab/cdef/1325” and the resulting file would be called “1325.xml”.

I’ve tried many different FOR loops, without any success. How can I go through a series of webpages – say “…/cdef/1324” through to “…/cdef/1340” and download a different (and differently named) xml file (“1324.xml” through to “1340.xml” in this example) for each one please?

score 0 · Answer 1 · answered Oct 05 '16 at 07:36

Probably it should work

import os
import urllib2

os.chdir('C:\Users\AB\Documents')

site= "http://www.example.com/ab/cdef/" # without 1324

hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',}

for number in range(1324, 1341):
    url = site + str(number)
    req = urllib2.Request(url, headers=hdr)
    try:
        page = urllib2.urlopen(req)
    except urllib2.HTTPError, e:
        print e.fp.read()       
    content = page.read()
    filename = str(number) + ".xml"
    file_ = open(filename, 'w')
    file_.write(content)
    file_.close()

You are a wonderful, wonderful human being. Worked perfectly. You've made a very, very frustrated man very, very happy. All the best. — RussellDada, Oct 05 '16 at 08:10

Python urllib2 query – download separate xml file for each webpage in series of urls

1 Answers1