Using python 2.4 and the built-in ZipFile
library, I cannot read very large zip files (greater than 1 or 2 GB) because it wants to store the entire contents of the uncompressed file in memory. Is there another way to do this (either with a third-party library or some other hack), or must I "shell out" and unzip it that way (which isn't as cross-platform, obviously).
Asked
Active
Viewed 1.8k times
21

Mazdak
- 105,000
- 18
- 159
- 188

Marc Novakowski
- 44,628
- 11
- 58
- 63
2 Answers
19
Here's an outline of decompression of large files.
import zipfile
import zlib
import os
src = open( doc, "rb" )
zf = zipfile.ZipFile( src )
for m in zf.infolist():
# Examine the header
print m.filename, m.header_offset, m.compress_size, repr(m.extra), repr(m.comment)
src.seek( m.header_offset )
src.read( 30 ) # Good to use struct to unpack this.
nm= src.read( len(m.filename) )
if len(m.extra) > 0: ex= src.read( len(m.extra) )
if len(m.comment) > 0: cm= src.read( len(m.comment) )
# Build a decompression object
decomp= zlib.decompressobj(-15)
# This can be done with a loop reading blocks
out= open( m.filename, "wb" )
result= decomp.decompress( src.read( m.compress_size ) )
out.write( result )
result = decomp.flush()
out.write( result )
# end of the loop
out.close()
zf.close()
src.close()

S.Lott
- 384,516
- 81
- 508
- 779
-
4@s-lott What does `ex= src.read( len(m.extra) )` and `cm= src.read( len(m.comment) )` what do you use the variables `ex` and `cm` for? What do you mean it's good to use a struct to unpack this? And what is the magic number `30` used for? – Jonathan Jun 08 '17 at 13:18
-
The header for each file contains the name of the file at a relative offset of 30 bytes, see https://en.wikipedia.org/wiki/Zip_(file_format). the extra and comment fields are not relevant, other than that we have to read those bytes to move ahead to the right position. – Benjamin Feb 24 '20 at 03:19
15
As of Python 2.6, you can use ZipFile.open()
to open a file handle on a file, and copy contents efficiently to a target file of your choosing:
import errno
import os
import shutil
import zipfile
TARGETDIR = '/foo/bar/baz'
with open(doc, "rb") as zipsrc:
zfile = zipfile.ZipFile(zipsrc)
for member in zfile.infolist():
target_path = os.path.join(TARGETDIR, member.filename)
if target_path.endswith('/'): # folder entry, create
try:
os.makedirs(target_path)
except (OSError, IOError) as err:
# Windows may complain if the folders already exist
if err.errno != errno.EEXIST:
raise
continue
with open(target_path, 'wb') as outfile, zfile.open(member) as infile:
shutil.copyfileobj(infile, outfile)
This uses shutil.copyfileobj()
to efficiently read data from the open zipfile object, copying it over to the output file.

Martijn Pieters
- 1,048,767
- 296
- 4,058
- 3,343