I have a nested tarfile in the form of
tarfile.tar.gz
--tar1.gz
--tar1.txt
--tar2.gz
--tar3.gz
I wanted to write a little script in python to extract all tars breadth first in to the same order of folders i.e. tar1.txt should lie in tarfile/tar1/
Here's the script,
#!/usr/bin/python
import os
import re
import tarfile
data = os.path.join(os.getcwd(), 'data')
dirs = [data]
while len(dirs):
dirpath = dirs.pop(0)
for subpath in os.listdir(dirpath):
if not re.search('(.tar)?.gz$', subpath):
continue
with tarfile.open(os.path.join(dirpath, subpath)) as tarf:
tarf.extractall(path=dirpath)
for subpath in os.listdir(dirpath):
newpath = os.path.join(dirpath, subpath)
if os.path.isdir(newpath):
dirs.append(newpath)
elif dirpath != data or os.path.islink(newpath):
os.remove(newpath)
But when i run the script I get the following error:
Traceback (most recent call last):
File "./extract.py", line 16, in <module>
with tarfile.open(os.path.join(dirpath, subpath)) as tarf:
File "/usr/lib/python2.7/tarfile.py", line 1678, in open
raise ReadError("file could not be opened successfully")
tarfile.ReadError: file could not be opened successfully
The '.tar.gz' file is extracted fine but not the nested '.gz' files. What's up here? Does tarfile module not handle .gz files?