1

I'm trying to use the Python zipfile library to unzip a split ZIP file by concatenating all file splits and then unzipping the final product, but I keep getting hit with the "Bad magic number for file header" error using this library.

I'm writing a Python script which will normally receive a single ZIP file, but will very rarely receive a ZIP file split into multiple parts (for example, foo.zip.001, foo.zip.002, etc). From what I can tell, there's no easy way to deal with this if you need to bundle the script up with its dependencies for a Docker container. However, I stumbled across this SO answer which explains that you can concatenate the files into a single ZIP file and treat it as such. So my battle plan is to concatenate all file splits into one big ZIP file and then unzip this file. I created a test case (with a Mac terminal) using a video file with the following command:

$ zip -s 5m test ch4_3.mp4

Here's my code to concatenate all files together:

import zipfile

split_files = ['test.z01', 'test.z02', 'test.z03', 'test.zip']

with open('test_video.zip', 'wb') as f:
    for file in split_files:
        with open(file, 'rb') as zf:
            f.write(zf.read())

If I go to my terminal and run unzip test_video.zip, this is the output:

$ unzip test_video.zip
Archive:  test_video.zip
warning [test_video.zip]:  zipfile claims to be last disk of a multi-part archive;
  attempting to process anyway, assuming all parts have been concatenated
  together in order.  Expect "errors" and warnings...true multi-part support
  doesn't exist yet (coming soon).
warning [test_video.zip]:  15728640 extra bytes at beginning or within zipfile
  (attempting to process anyway)
file #1:  bad zipfile offset (local header sig):  15728644
  (attempting to re-compensate)
  inflating: ch4_3.mp4

It seems like it hits a bit of a road bump, but it successfully works. However, when I try to run the following code:

if not os.path.exists('output'):
    os.mkdir('output')
with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
    z.extractall('output')

I get the following error:

---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
<ipython-input-60-07a6f56ea685> in <module>()
      2     os.mkdir('output')
      3 with zipfile.ZipFile('tester/test_video.zip', 'r') as z:
----> 4     z.extractall('output')

~/anaconda3/lib/python3.6/zipfile.py in extractall(self, path, members, pwd)
   1499 
   1500         for zipinfo in members:
-> 1501             self._extract_member(zipinfo, path, pwd)
   1502 
   1503     @classmethod

~/anaconda3/lib/python3.6/zipfile.py in _extract_member(self, member, targetpath, pwd)
   1552             return targetpath
   1553 
-> 1554         with self.open(member, pwd=pwd) as source,    1555              open(targetpath, "wb") as target:
   1556             shutil.copyfileobj(source, target)

~/anaconda3/lib/python3.6/zipfile.py in open(self, name, mode, pwd, force_zip64)
   1371             fheader = struct.unpack(structFileHeader, fheader)
   1372             if fheader[_FH_SIGNATURE] != stringFileHeader:
-> 1373                 raise BadZipFile("Bad magic number for file header")
   1374 
   1375             fname = zef_file.read(fheader[_FH_FILENAME_LENGTH])

BadZipFile: Bad magic number for file header

If I try to run it with the .zip file before the others, this is what I get:

split_files = ['test.zip', 'test.z01', 'test.z02', 'test.z03']

with open('test_video.zip', 'wb') as f:
    for file in split_files:
        with open(file, 'rb') as zf:
            f.write(zf.read())

with zipfile.ZipFile('test_video.zip', 'r') as z:
    z.extractall('output')

Here's the output:

---------------------------------------------------------------------------
BadZipFile                                Traceback (most recent call last)
<ipython-input-14-f7aab706dbed> in <module>()
      1 if not os.path.exists('output'):
      2     os.mkdir('output')
----> 3 with zipfile.ZipFile('test_video.zip', 'r') as z:
      4     z.extractall('output')

~/anaconda3/lib/python3.6/zipfile.py in __init__(self, file, mode, compression, allowZip64)
   1106         try:
   1107             if mode == 'r':
-> 1108                 self._RealGetContents()
   1109             elif mode in ('w', 'x'):
   1110                 # set the modified flag so central directory gets written

~/anaconda3/lib/python3.6/zipfile.py in _RealGetContents(self)
   1173             raise BadZipFile("File is not a zip file")
   1174         if not endrec:
-> 1175             raise BadZipFile("File is not a zip file")
   1176         if self.debug > 1:
   1177             print(endrec)

BadZipFile: File is not a zip file

Using the answer from this SO question, I've worked out that the header is b'PK\x07\x08' but I don't know why. I also used the testzip() function and it points straight to the culprit: ch4_3.mp4.

You can find the ZIP file in question at this link here. Any ideas on what to do?

Dharman
  • 30,962
  • 25
  • 85
  • 135
Nick de Silva
  • 83
  • 2
  • 6

0 Answers0