0

I'd like to download all of the files in a particular directory at a known URL. The names of the files won't necessarily be known, but their names will all contain a common keyword, and will have the same extension (.xml).

Is there an equivalent of "os.walk" for urllib2, such that I can simply walk through whatever files exist in the directory and open them for parsing?

The only examples of this I have seen online involve a a file of known name which contains a list of all the filenames in the directory. I do NOT want to do this...

Other possibly relevant info: The files are on an apache server, and they are publicly accessible.

  • contact site owner and ask if theyd be willing to give data – dm03514 Jun 21 '12 at 18:20
  • it's actually my data. I just want to be able to pull it down with my script automatically without having to update a list of files in the directory every time it updates... maybe i'll have to do it anyway. – user1472893 Jun 25 '12 at 19:38

1 Answers1

1

This is impossible without knowing the filenames - you'd have to randomly try every possible name, because your only way of knowing if a file with this name exists is requesting the url and seeing if you get a response. But you could let the Apache webserver generate a directory index for you (with mod_autoindex) and parse this to get the filenames.

l4mpi
  • 5,103
  • 3
  • 34
  • 54