0

I have a Django app that creates a .tar.gz file for download. Locally, I run on my dev machine Python 2.7, and on my remote dev server, Python 2.6.6. When I download the files, I can open both via Mac Finder / command line and view the contents. However, Python 2.7 does not like the .tar.gz file created on my remote dev server...and I need to upload these files to a site that uses Python to unpack / parse the archives. How can I debug what is wrong? In a Python shell:

>>> tarfile.is_tarfile('myTestFile_remote.tar.gz')
False

>>> tarfile.is_tarfile('myTestFile_local.tar.gz')
True

>>> f = tarfile.open('myTestFile_remote.tar.gz', 'r:gz')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1678, in open
    return func(name, filemode, fileobj, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1727, in gzopen
    **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1705, in taropen
    return cls(name, mode, fileobj, **kwargs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 1574, in __init__
    self.firstmember = self.next()
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/tarfile.py", line 2331, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header

From this SO question, I also tried running gzip -t against the remote file, but no output (which I believe means the file is OK). From this other SO question, I ran file myTestFile_remote.tar.gz, and I believe the output shows a correct file format:

myTestFile_remote.tar.gz: gzip compressed data, from Unix

I'm not quite sure what else I can try. It seems like the exception is thrown because my tarfile has self.offset == 0, but I don't know what that means, and I don't understand how to create the tarfile so that this does not happen. Suggestions are welcome...

Not sure what code would be useful here. My code to create and return the tarfile:

zip_filename = '%s_%s.tar.gz' % (course.name, course.url)
s = cStringIO.StringIO()
zf = tarfile.open(zip_filename, mode='w:gz', fileobj=s)

<add a bunch of stuff>

zipped = zip_collection(zip_data)
zf.close()

if zipped:
    response = HttpResponse(content_type="application/tar")
    response['Content-Disposition'] = 'attachment; filename=%s' % zip_filename
    s.seek(0, os.SEEK_END)
    response.write(s.getvalue())

------ UPDATE ------ Per this SO post, I also verified that the remote file is a tar.gz file, using tar -zxvf myTestFile_remote.tar.gz from the command line. The file extracts just fine.

Community
  • 1
  • 1
user
  • 4,651
  • 5
  • 32
  • 60
  • what do you add to the tar ? – sax Dec 16 '14 at 19:51
  • Image files, XML docs, HTML files. Both local and remote add the same type of files... – user Dec 16 '14 at 19:52
  • did you close the tarfile? – tdelaney Dec 16 '14 at 20:10
  • Yes, sorry, updating my example code block to show that (was trying to simplify the example code) – user Dec 16 '14 at 20:11
  • just as test can you try with compression `0` to see what happen ? – sax Dec 16 '14 at 20:17
  • Sorry, just to clarify @sax, you mean with `zf = tarfile.open(zip_filename, mode='w', fileobj=s)`? Then the downloaded file is a valid tarfile. – user Dec 16 '14 at 20:28
  • Huh -- @sax, that worked. Any reason why specifying the `:gz` would incorrectly encode the file, but without it, it is fine? If you write something up, I would be happy to accept! – user Dec 16 '14 at 20:31

1 Answers1

2

I think the problem is in the zlib and not in the tarfile itself.

Workarounds:

  • create file using bz2
    tarfile.open(zip_filename, mode='w:bz2', fileobj=s)

  • force the level of compression (both write/read)

    zf = tarfile.open(zip_filename, mode='w:gz', fileobj=s, compresslevel=9)

    zf = tarfile.open(zip_filename, mode='r:gz', compresslevel=9)

  • lower level of compression until the problem disappear

    zf = tarfile.open(zip_filename, mode='w:gz', fileobj=s, compresslevel=[9-0])

  • totally remove compression

    tarfile.open(zip_filename, mode='w', fileobj=s)

the last one is only if the compression is absolutely needed and none of the previous works:

f = open(zip_filename, "w") 
proc = subprocess.Popen(["gzip", "-9"], stdin=subprocess.PIPE, stdout=fobj) 
tar = tarfile.open(fileobj=proc.stdin, mode="w|") 
tar.add(...) 
tar.close() 
proc.stdin.close() 
f.close() 
sax
  • 3,708
  • 19
  • 22