4

I have created a .tar file on a Linux machine as follows:

tar cvf test.tar test_folder/

where the test_folder contains some files as shown below:

test_folder 
|___ file1.jpg
|___ file2.jpg
|___ ...

I am unable to programmatically extract the individual files within the tar archive using Python. More specifically, I have tried the following:

import tarfile
with tarfile.open('test.tar', 'r:') as tar:
    img_file = tar.extractfile('test_folder/file1.jpg')
    # img_file contains the object: <ExFileObject name='test_folder/test.tar'>

Here, the img_file does not seem to contain the requested image, but rather it contains the source .tar file. I am not sure, where I am messing things up. Any suggestions would be really helpful. Thanks in advance.

Swaroop
  • 1,219
  • 3
  • 16
  • 32
  • Why do you think it contains the .tar file? I've just tried following the steps you describe (although I had to change the syntax of the tar command to `tar cvf test.tar ./test_folder`) and I was able to extract image files with your code with no issues, provided I use the same path i.e. `'./test_folder/filename'` – Grismar Dec 11 '20 at 00:23
  • Note that having to use a different path was due to testing on Windows, just had a look on Debian and both your tar statement and Python code work - please provide details on why you think the code doesn't work. Is there a reason you include `:` in the `open` parameters? – Grismar Dec 11 '20 at 00:29
  • [What should I do when someone answers my question?](https://stackoverflow.com/help/someone-answers) – MarianD Jan 04 '21 at 18:39

3 Answers3

5

You probably wanted to use the .extract() method instead of your .extractfile() method (see my other answer):

import tarfile

with tarfile.open('test.tar', 'r:') as tar:
    tar.extract('test_folder/file1.jpg')         # .extract()  instead of .extractfile()

Notes:

  1. Your extracted file will be in the (maybe newly created) folder test_folder under your current directory.

  2. The .extract() method returns None, so there is no need to assign it (img_file = tar.extract(...))

MarianD
  • 13,096
  • 12
  • 42
  • 54
2

Appending 2 lines to your code will solve your problem:

import tarfile

with tarfile.open('test.tar', 'r:') as tar:
    img_file = tar.extractfile('test_folder/file1.jpg')
    
    # --------------------- Add this ---------------------------
    with open ("img_file.jpg", "wb") as outfile:
        outfile.write(img_file.read())

The explanation:

The .extractfile() method only provided you the content of the extracted file (i.e. its data).

        It don't extract any file to the file system.

So you have do it yourself - by reading this returned content (img_file.read()) and writing it into a file of your choice (outfile.write(...)).


Or — to simplify your life — use the .extract() method instead. See my other answer.

MarianD
  • 13,096
  • 12
  • 42
  • 54
-1

This is because extractfile() returns a io.BufferReader object, so essentially you are extracting the file in your directory and storing the io.BufferReader in your variable.

What you can do is, extract the file then open the file in a different content manager

import tarfile
with tarfile.open('test.tar', 'r:') as tar:
    tar.extractfile('test_folder/file1.jpg')

with open('test_folder/file1.jpg','rb') as img:
    # do something with img. Here img is your img file
  • 1
    This is not correct. `.extractfile()` does not extract the file to the file system, it provides an io.BufferedReader file-like object, so that it can be used in Python as if it were a file. The code you provided simply opens the previously tarred original file again. – Grismar Dec 11 '20 at 00:33