5

I would like to read some files from a tarball and save it to a new tarball. This is the code I wrote.

archive = 'dum/2164/archive.tar'

# Read input data.
input_tar = tarfile.open(archive, 'r|')
tarinfo = input_tar.next()
input_tar.close()

# Write output file.
output_tar = tarfile.open('foo.tar', 'w|')
output_tar.addfile(tarinfo)
output_tar.close()

Unfortunately, the output tarball is no good:

$ tar tf foo.tar
./1QZP_A--2JED_A--not_reformatted.dat.bz2
tar: Truncated input file (needed 1548288 bytes, only 1545728 available)
tar: Error exit delayed from previous errors.

Any clue how to read and write tarballs on the fly with Python?

Remi Guan
  • 21,506
  • 17
  • 64
  • 87

1 Answers1

5

OK so this is how I managed to do it.

archive = 'dum/2164/archive.tar'

# Read input data.
input_tar = tarfile.open(archive, 'r|')
tarinfo = input_tar.next()
fileobj = input_tar.extractfile(tarinfo)

# Write output file.
output_tar = tarfile.open('foo.tar', 'w|')
output_tar.addfile(tarinfo, fileobj)

input_tar.close()
output_tar.close()
vog
  • 23,517
  • 11
  • 59
  • 75
  • 3
    Note that you can still use the `r|` and `w|` modes so that the data is streamer rather than the entire tarfile read to memory/disk. – Yuval Apr 21 '17 at 12:16