15

The Python tarfile library does not detect a broken tar.

user@host$ wc -c good.tar
143360 good.tar

user@host$ head -c 130000 good.tar > cut.tar

user@host$ tar -tf cut.tar 
...
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now

Very nice, the command line tool recognizes an unexpected EOF.

user@host$ python
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
>>> import tarfile
>>> tar=tarfile.open('cut.tar')
>>> tar.extractall()

Not nice. The Python library decodes the file, but raises no exception.

How to detect unexpected EOF with the Python library? I want to avoid the subprocess module.

The parameter errorlevel does not help. I tried errorlevel=1 and errorlevel=2.

guettli
  • 25,042
  • 81
  • 346
  • 663

2 Answers2

6

I wrote a work around. It works with my tar files. I guess it supports not all types of objects which can be stored in a tar file.

# -*- coding: utf-8 -*-
from __future__ import absolute_import, division, unicode_literals, print_function
import os
import tarfile

class TarfileWhichRaisesOnEOF(tarfile.TarFile):
    def extractall(self, path=".", members=None):
        super(TarfileWhichRaisesOnEOF, self).extractall(path, members)
        if members is None:
            members = self

        for tarinfo in members:
            if not tarinfo.isfile():
                continue
            file=os.path.join(path, tarinfo.name)
            size_real=os.path.getsize(file)
            if size_real!=tarinfo.size:
                raise tarfile.ExtractError('Extracting %s: Size does not match. According to tarinfo %s and on disk %s' % (
                    tarinfo, tarinfo.size, size_real))
guettli
  • 25,042
  • 81
  • 346
  • 663
1

This has been fixed in Python 3 -- an OSError is raised regardless of the errorlevel setting.

Ethan Furman
  • 63,992
  • 20
  • 159
  • 237
  • Sorry, in my case setting the errorlevel does not work. This means the Python3 changes won't help here. – guettli May 29 '15 at 06:14
  • @guettli: you tried with 3.4? Please add a note to http://bugs.python.org/issue24259 saying so. – Ethan Furman May 29 '15 at 21:15
  • I tried to extractall() the uploaded tar_which_is_cut.tar with Python 3.4.0. It raises an OSError - good. Only 2.7 affected? – guettli May 30 '15 at 11:01