Skip broken archives (.tar.gz) when using 'tarfile'

Question

I have number of 100+ .tar.gz files in a folder. Some of these files are corrupted.

I am trying to extract all of them. In case of corrupt file, I want to skip that archive and move onto next file. Additionally if possible, in the end I would like a list of archives which failed extraction.

import os
import tarfile
files = os.listdir('G:\\A')
for file in files:
    id = file.split('.')
    with tarfile.open('G:\\A\\' + file,'r:gz') as tar:
        tar.extractall(path='G:\\A\\Extracted\\' + id[0])

The loop proceeds as expected, however when it encounters broken archive it gives error: "Compressed file ended before the end-of-stream marker was reached"

Edit: As per ILI comment, I tried following, but got same error.

BLOCK_SIZE = 1024
for file in files:    
    with tarfile.open('G:\\Sat Img\\' + file) as tardude:
        for member in tardude.getmembers():
            with tardude.extractfile(member.name) as target:
                for chunk in iter(lambda: target.read(BLOCK_SIZE), b''):
                    pass

Have you tried: https://stackoverflow.com/a/32312857/499581 – l'L'l Jan 12 '19 at 06:30 — l'L'l, Jan 12 '19 at 06:30

score 1 · Accepted Answer · answered Jan 12 '19 at 12:29

If I understand your question correctly, you might be looking for modification like this one:

import os
import tarfile
files = os.listdir('G:\\A')
for file in files:
    id = file.split('.')
    try:
        with tarfile.open('G:\\A\\' + file,'r:gz') as tar:
            tar.extractall(path='G:\\A\\Extracted\\' + id[0])
    except tarfile.ReadError:  # reading tarfile failed
        continue               # move on to the next one

Not sure how are your files corrupted and what sort of error would you see, so you may need to catch a different exception.

Skip broken archives (.tar.gz) when using 'tarfile'

1 Answers1