0

The zipfile module is very interesting to manage .zip files with python.

However if the .zip file has been created on a linux system or macos the separator is of course '/' and if we try to work with this file on a Windows system there can be a problem because the separator is '\'. So, for example, if we try to determine the directory root compressed in the .zip file we can think to something like:

from zipfile import ZipFile, is_zipfile
import os

if is_zipfile(filename):

    with ZipFile(filename, 'r') as zip_ref:
        packages_name = [member.split(os.sep)[0] for member in zip_ref.namelist()
                         if (len(member.split(os.sep)) == 2 and not
                                                       member.split(os.sep)[-1])]

But in this case, we always get packet_name = [] because os.sep is "\" whereas since the compression was done on a linux system the paths are rather 'foo1/foo2'.

In order to manage all cases (compression on a linux system and use on a Windows system or the opposite), I want to use:

from zipfile import ZipFile, is_zipfile
import os

if is_zipfile(filename):

    with ZipFile(filename, 'r') as zip_ref:

        if all([True if '/' in el else
                False for el in zip_ref.namelist()]):
            packages_name = [member.split('/')[0] for member in zip_ref.namelist()
                             if (len(member.split('/')) == 2 and not
                                                       member.split('/')[-1])]

        else:
            packages_name = [member.split('\\')[0] for member in zip_ref.namelist()
                             if (len(member.split('\\')) == 2 and not
                                                           member.split('\\')[-1])]

What do you think of this? Is there a more direct or more pythonic way to do the job?

Dharman
  • 30,962
  • 25
  • 85
  • 135
servoz
  • 606
  • 9
  • 22
  • 1
    [This Q&A](https://stackoverflow.com/questions/8176953/python-zipfile-path-separators) suggests that the separator will always be `'/'`, if I understand it correctly. – snakecharmerb Oct 17 '20 at 16:17
  • That was also what I thought. But I realised that in our project it didn't work because the CI tests were failing. I was looking for the reason why and I came to the conclusion I am describing. Since I changed by the little piece of code proposed in my post, the integration tests are no longer failing! – servoz Oct 17 '20 at 16:36

1 Answers1

1

Thanks to @snakecharmerb answer and to the reading of the link he proposed, I have just understood. Thank you @snakecharmerb for showing me the way ... In fact, indeed as described in the link proposed, internally zipfile uses only '/' and this independently of the OS used. As I like to see things concretely I just did this little test:

  • On a Windows OS I created with the usual means of this OS (not in command line) a file testZipWindows.zip containing this tree structure:

    • testZipWindows
      • foo1.txt
      • InFolder
        • foo2.txt
  • I did the same thing on a linux OS (and without also using a command line) for the testZipFedora.zip archive:

    • testZipFedora
      • foo1.txt
      • InFolder
        • foo2.txt

This is the result:

$ python3
Python 3.7.9 (default, Aug 19 2020, 17:05:11) 
[GCC 9.3.1 20200408 (Red Hat 9.3.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from zipfile import ZipFile
>>> with ZipFile('/home/servoz/Desktop/test/testZipWindows.zip', 'r') as WinZip:
...  WinZip.namelist()
... 
['testZipWindows/', 'testZipWindows/foo1.txt', 'testZipWindows/InFolder/', 'testZipWindows/InFolder/foo2.txt']
>>> with ZipFile('/home/servoz/Desktop/test/testZipFedora.zip', 'r') as fedZip:
...  fedZip.namelist()
... 
['testZipFedora/', 'testZipFedora/foo1.txt', 'testZipFedora/InFolder/', 'testZipFedora/InFolder/foo2.txt']

So it all lights up! We must indeed use os.path.sep to work properly in multiplatform but when we deals with zipfile library it is absolutely necessary to use '/' as separator and not os.sep (or os.path.sep). That was my mistake !!!

So the code to use in a multiplatform way for the example of my first post is just:

from zipfile import ZipFile, is_zipfile
import os

if is_zipfile(filename):

    with ZipFile(filename, 'r') as zip_ref:
        packages_name = [member.split('/')[0] for member in zip_ref.namelist()
                             if (len(member.split('/')) == 2 and not
                                                       member.split('/')[-1])]

And not all the useless things I had imagined...

servoz
  • 606
  • 9
  • 22