3

I am attempting to unzip files of various sizes (some are 4GB or above in size) using python, however I have noticed that on several occasions especially when the files are extremely large the file fails to unzip. When I open the new result file it is empty. Below is the code i am using - is there anything wrong with my approach?

        inF = gzip.open(localFile, 'rb')
        localFile = localFile[:-3]
        outF = open(localFile, 'wb')
        outF.write( inF.read() )
        inF.close()
        outF.close()
Christophe Roussy
  • 16,299
  • 4
  • 85
  • 85
godzilla
  • 3,005
  • 7
  • 44
  • 60
  • no error message or exceptions just an empty file – godzilla May 28 '14 at 15:46
  • Check `/var/log/syslog` on unix or the events tab in windows. You might be exceeding the memory on your machine in which case the OS will nuke your proc out of hand. You should see some type of messaging about like `OutOfMemory` or some such. – nsfyn55 May 28 '14 at 15:46
  • the process never dies, it carries on, i suspect it could be a memory issue but what is the leanest workaround? – godzilla May 28 '14 at 15:47
  • Try using the context syntax: `with gzip.open(localFile, 'rb') as inF:` and `with open(localFile, 'wb') as outF:` and finally `outF.write(inF.read())`. That should be functionally identical, but I have heard vague things about Python behaving better when you use this syntax, so maybe it will help? – Engineero May 28 '14 at 15:47
  • What os are you using - in Windows XP 32 bit you cannot create a file larger than 2 gb (I think) but it should write until it runs up against that limitation and then crash. – PyNEwbie May 28 '14 at 15:47
  • if its still chugging along perhaps you aren't giving it enough time. – nsfyn55 May 28 '14 at 15:48
  • drop some prints in there and let it tie up the `TTY` – nsfyn55 May 28 '14 at 15:48
  • Check out [this question](http://stackoverflow.com/questions/339053/how-do-you-unzip-very-large-files-in-python), which uses `zipfile` and `zlib` libraries to decompress large files. – Engineero May 28 '14 at 15:49
  • @Engineero i am using gz files, i am not sure if those support this file type... – godzilla May 28 '14 at 15:54
  • I am not positive, but I think gnuzip (gz) is supported by the other libraries. I am sure a little digging will tell you. – Engineero May 28 '14 at 16:10
  • Instead of `inF.read()`, loop over the input file, writing each piece to the output. This will reduce memory requirements. – johntellsall May 28 '14 at 19:09
  • @shavenwarthog do you have an example i can follow? – godzilla May 29 '14 at 09:22

3 Answers3

1

in this case it looks like you don't need python to do any processing on the file you read in so you might be better off just using subprocess.Popen:

from subprocess import Popen
Popen('gunzip %s %s' % (infilename, outfilename)).wait()

you might need to pass shell=True, but other than that should be good

acushner
  • 9,595
  • 1
  • 34
  • 34
0

Another solution for large .zip files (works on Ubuntu 16.04.4). First install 7z:

sudo apt-get install p7zip-full

Then in your python code, call 7zip with:

import subprocess
subprocess.call(['7z', 'x', src_file, '-o'+target_dir])
RomaneG
  • 330
  • 2
  • 10
-2

This code loops of blocks of input data, writing each to an output file. In this way we don't read the entire input into memory at once, conserving memory and avoiding mysterious crashes.

import gzip, os

localFile = 'cat.gz'
outFile = os.path.splitext(localFile)[0]

print 'Unzipping {} to {}'.format(localFile, outFile)

with gzip.open(localFile, 'rb') as inF:
    with open( outFile, 'wb') as outF:
        outF.write( inF.read(size=1024) )
johntellsall
  • 14,394
  • 4
  • 46
  • 40