1

This is motivated by pathfile issues (unfortunately this doesn't seem to be true in my case).

I have a zipfile that I am trying to extract with python. The zipfile appears to have been created on windows. The code I have to extract the files from the zipfile is like this:

def unzip_file(zipfile_path):
    z = zipfile.ZipFile(zipfile_path)
    # get pathname without extension
    directory = os.path.splitext(zipfile_path)[0]
    print directory
    if not os.path.exists(directory):
        os.makedirs(directory)
    #this line doesn't work. tries to extract "Foobar\\baz.quux" to directory and complains that the directory doesn't exist
    # z.extractall(directory)
    for name in z.namelist():
        # actual dirname we want is this
        # (dirname, filename) = os.path.split(name)
        # I've tried to be cross-platform, (see above) but aparently zipfiles save filenames as
        # Foobar\filename.log so I need this for cygwin
        dir_and_filename = name.split('\\')
        if len(dir_and_filename) >1:
            dirname = dir_and_filename[0:-1]
            filename = dir_and_filename[-1]
        else:
            dirname = ['']
            filename = dir_and_filename[0]

        out_dir = os.path.join(directory, *dirname)
        print "Decompressing " + name + " on " + out_dir
        if not os.path.exists(out_dir):
            os.makedirs(out_dir)
        z.extract(name, out_dir)
    return directory

while this seems overly complicated this is to try and workaround some bugs I've found. One member of the zipfile is Foobar\\filename.log. on trying to extract that it complains that the directory doesn't exist. I need a way to use a method like so:

zipfile.extract_to(member_name, directory_name, file_name_to_write)

where member name is the name of the member to be read (in this example Foobar\\filename.log), directory_name is the name of the directory that we want to write to, and file_name_to_write is the name of the file that we want to write (in this case it would be filename.log). This does not seem to be supported. Does anyone have any other ideas on how to get a cross platform implementation of extracting this kind of zip archive that has nested expressions?

According to this answer the zipfile I have may not meet the zipfile specifications (it says that:

All slashes MUST be forward slashes '/' as opposed to backwards slashes '\' for compatibility with Amiga and UNIX file systems etc.

in the zipfile specification 4.4.17) How do I solve this problem?

Dharman
  • 30,962
  • 25
  • 85
  • 135
Mike H-R
  • 7,726
  • 5
  • 43
  • 65
  • If the issue is the direction of any slash characters, couldn't you just replace any backslashes in the pathfile with forward slashes before using it? – martineau Feb 12 '15 at 18:19
  • @martineau That's what would like to do, but there are two different concepts, the concept of a member name (which states where the data is to be found in the zipfile, and is `foobar\\filename.log` in this example) and the concept of the filepath that I would like to write out. I cannot find a method that allows me to specify both though, I can specify the member name that becomes the filename written out and the directory that it should be written out with using `zipfile.extract` – Mike H-R Feb 12 '15 at 22:56
  • A `unzip_file(r'whatever\testzip.zip')` call to your code prints `Decompressing Foobar\filename.log on whatever\testzip\Foobar` and tries to `z.extract(r'Foobar\filename.log', r'whatever\testzip\Foobar')` which likely means it will try to create the latter named subdirectory. On my Windows system, zipfiles are automatically treated as subdirectories by the OS, which may think you're trying to create a subdirectory that already exists since there's logically one by that name _in_ the zip file which is in the same upper level directory. Try overriding that as an experiment and see what happens. – martineau Feb 13 '15 at 02:36

1 Answers1

1

I solved this by simply shelling out to unzip. We need to check for an exit code of 0 or 1 as an exit code of one is returned by the unzip command (due to the malformed zipfile, the message given is something like warning: zipfile appears to contain backslashes as path separators.

#!/bin/bash
unzip $1 -d $2
exit_code=$?
# we catch exit_codes < 2 as the zipfiles are malformed
if [ $exit_code -lt 2 ]
then exit 0
else exit $exit_code
fi
Mike H-R
  • 7,726
  • 5
  • 43
  • 65