Downloading an arbitrary number of files from a URL with urllib2 in Python 2.7. Equivalent of "os.walk" for urllib2?

Question

I'd like to download all of the files in a particular directory at a known URL. The names of the files won't necessarily be known, but their names will all contain a common keyword, and will have the same extension (.xml).

Is there an equivalent of "os.walk" for urllib2, such that I can simply walk through whatever files exist in the directory and open them for parsing?

The only examples of this I have seen online involve a a file of known name which contains a list of all the filenames in the directory. I do NOT want to do this...

Other possibly relevant info: The files are on an apache server, and they are publicly accessible.

it's actually my data. I just want to be able to pull it down with my script automatically without having to update a list of files in the directory every time it updates... maybe i'll have to do it anyway. — user1472893, Jun 25 '12 at 19:38

score 1 · Answer 1 · answered Jun 21 '12 at 17:55

This is impossible without knowing the filenames - you'd have to randomly try every possible name, because your only way of knowing if a file with this name exists is requesting the url and seeing if you get a response. But you could let the Apache webserver generate a directory index for you (with mod_autoindex) and parse this to get the filenames.

Downloading an arbitrary number of files from a URL with urllib2 in Python 2.7. Equivalent of "os.walk" for urllib2?

1 Answers1