5

I'm having problems with an archive that I built using zipfile in Python. I'm iterating over all the files in a directory and writing them to an archive. When I attempt to extract them afterward I get an exception related to the path separator.

the_path= "C:\\path\\to\\folder"
zipped= cStringIO.StringIO()
zf = zipfile.ZipFile(zipped_cache, "w", zipfile.ZIP_DEFLATED)
for dirname, subdirs, files in os.walk(the_path) :
    for filename in files:
        zf.write(os.path.join(dirname, filename), os.path.join(dirname[1+len(the_path):], filename))
zf.extractall("C:\\destination\\path")
zf.close()
zipped_cache.close()

Here's the exception:

zipfile.BadZipfile: File name in directory "env\index" and header "env/index" differ.

Update: I replaced the string buffer cStringIO.StringIO() with a temporary file (tempfile.mkstemp("temp.zip")) and now it works. There's something that happens when the zipfile module writes to the buffer that corrupts the archive, not sure what the problem is though.

The issue was that I was reading/writing the information from/into files that were open in "r"/"w" mode instead of "rb"/"wb". This isn't an issue in Linux, but it gave me errors in Windows due to character encoding. Solved.

Dharman
  • 30,962
  • 25
  • 85
  • 135
Cat
  • 7,042
  • 8
  • 34
  • 36
  • I gave those only as examples, the paths are correctly formed, with escaped backslashes ('C:\\path\\to\\folder'). None of the replies answer the question though. The exception is "zipfile.BadZipfile: File name in directory "env\index" and header "env/index" differ." – Cat May 19 '11 at 19:46
  • 2
    +1 for support against random, unexplained downvotes. – jedwards May 19 '11 at 19:47
  • I updated my answer -- this should address your issue. – jedwards May 19 '11 at 20:00
  • What happens if you use `zf.write(os.path.join(dirname, filename))`? – Velociraptors May 19 '11 at 20:12
  • Well if I don't give the `write` function a second argument it simply builds the entire directory structure in the archive, which I don't need. E.g. `useless\\directory\\structure\\up\\to\\relevant\\directory` instead of `relevant\\directory`. I think the issue is related to `cStringIO.StringIO()` – Cat May 19 '11 at 20:17
  • When you edit your original post to reflect your changes you should really leave in your "bad" code, so the answers stay relevant for people who come across this in the future. – jedwards May 19 '11 at 22:45

4 Answers4

5

You should consider adding an r before the string to indicate it is a raw string -- the backslashes in the path are being interpreted as escape characters.

The following code:

#!/bin/env python    
print(r"C:\destination\path")
print(r"C:\path\to\folder")
print("C:\destination\path")
print("C:\path\to\folder")

produces the following output:

C:\destination\path
C:\path\to\folder
C:\destination\path
C:\path o
         older

Note that the \t and \f are interpreted as tab and formfeed in the last line.

Interestingly, you could also change the backslashes to forward slashes (i.e. open("C:/path/to/folder"), which would work.

Or, escape the backslashes with ... backslashes (i.e. open("C:\\path\\to\\folder")).

IMO, the clearest and easiest solution is to simply add an r.


Edit: It looks like you need to go with the second solution, forward slashes. The zipfile library is kind of strict apparently -- and given that this is a window-only bug, it probably snuck through. (See Issue 6839).

jedwards
  • 29,432
  • 3
  • 65
  • 92
4

Found the answer to my question here: http://www.penzilla.net/tutorials/python/scripting.

I'm pasting the two functions that are relevant to zipping up a directory. The problem was not the string buffer, nor the slashes, but the way I was iterating and writing to the zipfile. These 2 recursive functions fix the problem. Iterating over the entire tree of sub-directories with os.walk is not a good way to write the archive.

def zippy(path, archive):
    paths = os.listdir(path)
    for p in paths:
        p = os.path.join(path, p) # Make the path relative
        if os.path.isdir(p): # Recursive case
            zippy(p, archive)
        else:
            archive.write(p) # Write the file to the zipfile
    return

def zipit(path, archname):
    # Create a ZipFile Object primed to write
    archive = ZipFile(archname, "w", ZIP_DEFLATED) # "a" to append, "r" to read
    # Recurse or not, depending on what path is
    if os.path.isdir(path):
        zippy(path, archive)
    else:
        archive.write(path)
    archive.close()
    return "Compression of \""+path+"\" was successful!"
Cat
  • 7,042
  • 8
  • 34
  • 36
1

You need to escape the backslashes in your paths.

Try changing the following:

  • the_path= "C:\path\to\folder" to the_path = "C:\\path\\to\\folder", and
  • zf.extractall("C:\destination\path") to zf.extractall("C:\\destination\\path").
Pär Wieslander
  • 28,374
  • 7
  • 55
  • 54
1

You can use forward slashes as path separators, even on Windows. I suggest trying that when you create the zip file.

Steve Howard
  • 6,737
  • 1
  • 26
  • 37