5

I have a tar file which has number of folders within it. In each folder there are number of image files. I need to write a python script which will read each image file and perform some action on the image (ex: thresholding etc.,) and save the image file in the directory I specified. This process need to be done without un-tarring the tar file.

t = tarfile.open('example.tar', 'r')
for member in t.getmembers():
    f = t.extractfile(member)

While I am trying to print f, it's returning None type. What am I doing wrong?

pushkin
  • 9,575
  • 15
  • 51
  • 95
Harathi
  • 999
  • 2
  • 7
  • 8
  • 1
    Directories are members and don't return file objects. If your tarfile has subdirectories its likely that one of those is the first item enumerated. Try adding `print(member)` before the extract to see what you get. – tdelaney Mar 08 '18 at 00:09
  • 1
    The docs make it pretty clear that, as tdelaney says, only regular files (and symlinks) will give you file objects, not directories, and you already know that you have directories. The question is: why is this a problem in the first place? You already need to do some kind of filtering on files that aren’t image files, so why is also filtering on None a problem? – abarnert Mar 08 '18 at 00:13

1 Answers1

1

Simply use this function.

import tarfile

#simple function to extract the train data
#tar_file : the path to the .tar file
#path : the path where it will be extracted
def extract(tar_file, path):
    opened_tar = tarfile.open(tar_file)

    if tarfile.is_tarfile(tar_file):
        opened_tar.extractall(path)
    else:
        print("The tar file you entered is not a tar file")
Khubaib Raza
  • 543
  • 6
  • 10
  • This is not the correct answer to the question - The question mentions "This process needs to be done without un-tarring the tar file" – Swaroop Dec 08 '20 at 21:38