-5

Error when reading a zip file in python

I have a problem where I have to read over the zip folder and read the zip files within.

I am getting an error while reading one of the text files from the zipped folder.

enter image description here

with zipfile.ZipFile(file_name) as zipped:
        for filenames in zipped.namelist():
            if not os.path.isdir(filenames):
                print(filenames)
                with open(filenames,"r",encoding="utf8") as file1:
                    print(file1)

When I try to run this code I a getting an error that xxxx-005.txt file not found

I have the zip file in the same folder as the code.

I have tried the below approach as well

import zipfile 
import os 
def read_file(file_name): 
    docs1 = [] 
    doc = [] 
    with zipfile.ZipFile(file_name) as zipped:    
    for filenames in zipped.namelist(): 
        if not os.path.isdir(filenames): # print(filenames) with 
           zipped.open(filenames) as file1: print(file1) read_file('xxxx.zip') 

**It printed the below error ----NotImplementedError: compression type 9 (deflate64)---- –**
E_net4
  • 27,810
  • 13
  • 101
  • 139
Advait
  • 74
  • 1
  • 3
  • 13
  • If the code and the zipped file are on the same level, wouldn't your file path need to be `zip_file_name/xxxx-0005.txt` instead of just `xxxx-005.txt`? Since the error says file not found, you need to figure out what the correct path is. – shriakhilc May 22 '19 at 06:39
  • I can access the first four files but not anything after xxxx-005.txt and how to add the exact path to the file? – Advait May 22 '19 at 06:44

3 Answers3

1

The issue is probably due to directories inside your zip archive. Those directories are causing false positives here:

if not os.path.isdir(filenames):

Instead check if the last character in the filename is /.

import zipfile, os
with zipfile.ZipFile(file_name) as zipped:
    for filenames in zipped.namelist():
        if filenames[-1] != '/':
            print(filenames)

(It feels kind of ugly. Maybe someone else knows a better method?)

pktl2k
  • 598
  • 5
  • 12
1

The zipfile package provided by Python does not support Deflate64 compression. Your error message clearly states this, and the compression message was intentionally unsupported because of copyright issues.

An older question was answered with the same disappointing solution.

Extracting large files with zipfile

There apparently is a monkeypatched package available on pypi that provides this functionality, but I have not yet tried it. (https://pypi.org/project/zipfile-deflate64/)

Incidentally, by iterating over ZipInfo objects using the infolist() method, you could check each ZipInfo instance to determine if the entry is a directory with its is_dir() method. (os.path.isdir only relates to local files, not those contained in a Zip archive).

Ben Y
  • 913
  • 6
  • 18
0

Use the ZipFile.open method rather than the default open method. This doesn't let you specify an encoding though, and I'm not sure if that is important to you.

with zipfile.ZipFile(file_name) as zipped:
        for filenames in zipped.namelist():
            if not os.path.isdir(filenames):
                print(filenames)
                with zipped.open(filenames,"r") as file1:
                    print(file1)

Also, I noticed that namelist also contains zip_file/ as one of the names, and it gave a False value for os.path.isdir too. So you may need to take care of that case specifically.

shriakhilc
  • 2,922
  • 2
  • 12
  • 17
  • Please add this in the question itself, so that you can format it better. – shriakhilc May 22 '19 at 07:13
  • I have added the following in the question – Advait May 22 '19 at 07:17
  • According to [the docs](https://docs.python.org/3/library/zipfile.html#zipfile.ZipFile), this is because it cannot recognize your Zip compression method. If it is a normal zip file, try using the `ZIP_DEFLATED` compression and see if that works out. – shriakhilc May 22 '19 at 07:17
  • I have another zipped file which was downloaded from the internet and when I try to import that file. It reads the files upto xxxx-005 file and stops – Advait May 22 '19 at 07:19
  • Even with the new code? Then I'm not sure what your problem might be – shriakhilc May 22 '19 at 07:21
  • 1
    As far as I can tell, you didn't exclude the false positive of os.path.isdir when checking the zip_file/ folder. Please try that. – pktl2k May 22 '19 at 07:25
  • I have added "compression=zipfile.ZIP_DEFLATED" to my zipfile.Zipfile line and still getting the same error – Advait May 22 '19 at 07:25
  • @pktl2k what do you mean by your answer? I am new and have no good knowledge about python – Advait May 22 '19 at 07:35
  • 1
    If there are directories inside the zip archive, they are not recognized as directories. I.e. os.path.isdir() will return false, if the directory is inside the zip archive. So you need to exclude these directories by other means, for example by checking if the last character of the filename is '/'. – pktl2k May 22 '19 at 07:58