1

There is a zipfile module in Python working with files. Is there any library to work on zip files on remote webserver storage in lazy evaluation fashion? For example, I want to enumerate all files in http://host/file.zip - that operation would not need whole HTTP file retrieval. Then I would like to get all *.txt files from remote zip file - that operation would need only partial HTTP retrieval too. So, I need some kind of lazy evaluation between HTTP layer and unzip routines. Is there any known attempt to make is possible in Python?

  • I don't know if it exists, but I think it could be done by reading the central directory file header of the file, as described [here](http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers). For this you could read only the first few bytes of the file. I'm not sure if it would be possible to retrieve single files from within the ZIP file without downloading the full archive. I don't know how flexible is the HTTP protocol for seeking within a resource and would probably be limited by server configuration. I like the idea of such a library though. – El Barto Mar 11 '12 at 22:06
  • Yes, of course it's possible, but I don't like idea to write my own library before I know it's already done :) –  Mar 11 '12 at 22:09
  • Just a thought: if HTTP would be mapped into filesystem and accessed via UNIX open(), fseek(), read() functions, than all task can be done by simple unzip utility... –  Mar 11 '12 at 22:12
  • Hmm, something like that maybe http://httpfs.sourceforge.net/ –  Mar 11 '12 at 22:16
  • 2
    Take a look at this: http://stackoverflow.com/questions/7829311/is-there-a-library-for-retrieving-a-file-from-a-remote-zip – El Barto Mar 11 '12 at 22:16

0 Answers0