0

I got the following code from my question on how to convert the tar.gz file to zip file.

import tarfile, zipfile
tarf = tarfile.open(name='sample.tar.gz', mode='r|gz' )
zipf = zipfile.ZipFile.open( name='myzip.zip', mode='a', compress_type=ZIP_DEFLATED )
for m in tarf.getmembers():
    f = tarf.extractfile( m )
    fl = f.read()
    fn = m.name
    zipf.writestr( fn, fl )
tarf.close()
zipf.close()

but when I run it I get the error.

What should I change in the code to make it work?

NameError: name 'ZIP_DEFLATED' is not defined
Geet
  • 2,515
  • 2
  • 19
  • 42

1 Answers1

1

ZIP_DEFLATED is a name defined by the zipfile module; reference it from there:

zipf = zipfile.ZipFile(
    'myzip.zip', mode='a',
    compression=zipfile.ZIP_DEFLATED)

Note that you don't use the ZipFile.open() method here; you are not opening members in the archive, you are writing to the object.

Also, the correct ZipFile class signature names the 3rd argument compression. compress_type is only used as an attribute on ZipInfo objects and for the ZipFile.writestr() method. The first argument is not named name either; it's file, but you normally would just pass in the value as a positional argument.

Next, you can't seek in a gzip-compressed tarfile, so you'll have issues accessing members in order if you use tarf.getmembers(). This method has to do a full scan to find all members to build a list, and then you can't go back to read the file data anymore.

Instead, iterate directly over the object, and you'll get member objects in order at a point you can still read the file data too:

for m in tarf:
    f = tarf.extractfile( m )
    fl = f.read()
    fn = m.name
    zipf.writestr( fn, fl )
Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • Thanks. When I did that I got a new error: "TypeError: unbound method open() must be called with ZipFile instance as first argument (got nothing instead)" – Geet Sep 01 '16 at 15:09
  • @Geet: updated; you were using a reference to the `ZipFile().open()` method, not creating an instance of the `ZipFile` class. – Martijn Pieters Sep 01 '16 at 15:11
  • @Martjin Pieters: is there anything wrong with the argument 'compress_type'? Now, I am getting "TypeError: __init__() got an unexpected keyword argument 'compress_type' – Geet Sep 01 '16 at 15:21
  • @Geet: ah, indeed, there is no such keyword; you can just pass in the arguments by position, but the correct name is `compression`. – Martijn Pieters Sep 01 '16 at 15:22
  • Got it. I also changed "r|gz" to "r:gz" to solve the new error "StreamError: seeking backwards is not allowed" Thanks a lot! – Geet Sep 01 '16 at 15:33
  • @Geet: updated; `tarf.getmembers()` scans through the file. But since this involves decompressing the gzip-stream, you can't then go back to load file contents. Use iteration directly instead. – Martijn Pieters Sep 01 '16 at 15:42
  • Yeah, I realised that with you earlier comment and updated it accordingly. You rock @MartjinPieters. Thanks again! – Geet Sep 01 '16 at 15:43
  • Oops, I didn't understand your explanation about tarf.getmembers(). – Geet Sep 01 '16 at 15:45
  • @Geet: `tarf.getmembers()` produces a list object with all member objects. But to read the information needed for each object, you need to read through the whole stream. The stream contains alternating blocks of metadata (filename, etc) and filedata, to produce the `getmembers()` list all the filedata blocks must be skipped. – Martijn Pieters Sep 01 '16 at 15:46
  • @Geet: by using `for m in tarf:`, what happens instead is that each member object is produced *one by one*. The stream can be read to find metadata, then your loop can read the filedata, then the next iteration metadata can be read again, etc. until the whole file is processed. – Martijn Pieters Sep 01 '16 at 15:47
  • Sorry for being so novice. :( Should I keep it or change that line of the code? – Geet Sep 01 '16 at 15:55
  • @Geet: you use `for m in tarf:` instead of `for m in tarf.getmembers():`. – Martijn Pieters Sep 01 '16 at 15:56