10
a.zip---
      -- b.txt
      -- c.txt
      -- d.txt

Methods to process the zip files with Python,

I could expand the zip file to a temporary directory, then process each txt file one bye one

Here, I am more interested to know whether or not python provides such a way so that I don't have to manually expand the zip file and just simply treat the zip file as a specialized folder and process each txt accordingly.

q0987
  • 34,938
  • 69
  • 242
  • 387
  • All of these are duplicates: http://stackoverflow.com/search?q=python+zipfile – S.Lott Sep 23 '11 at 19:22
  • possible duplicate of [How do I read selected files from a remote Zip archive over HTTP using Python?](http://stackoverflow.com/questions/94490/how-do-i-read-selected-files-from-a-remote-zip-archive-over-http-using-python) – S.Lott Sep 23 '11 at 19:24
  • 1
    Or maybe a duplicate of this: http://stackoverflow.com/questions/4890860/make-in-memory-copy-of-a-zip-by-iterrating-over-each-file-of-the-input – S.Lott Sep 23 '11 at 19:25

2 Answers2

29

The Python standard library helps you.

Doug Hellman writes very informative posts about selected modules: https://pymotw.com/3/zipfile/

To comment on Davids post: From Python 2.7 on the Zipfile object provides a context manager, so the recommended way would be:

import zipfile
with zipfile.ZipFile("zipfile.zip", "r") as f:
    for name in f.namelist():
        data = f.read(name)
        print name, len(data), repr(data[:10])

The close method will be called automatically because of the with statement. This is especially important if you write to the file.

Daniel Griscom
  • 1,834
  • 2
  • 26
  • 50
rocksportrocker
  • 7,251
  • 2
  • 31
  • 48
7

Yes you can process each file by itself. Take a look at the tutorial here. For your needs you can do something like this example from that tutorial:

import zipfile
file = zipfile.ZipFile("zipfile.zip", "r")
for name in file.namelist():
    data = file.read(name)
    print name, len(data), repr(data[:10])

This will iterate over each file in the archive and print out its name, length and the first 10 bytes.

The comprehensive reference documentation is here.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490