In fact, it is possible... but with huge restrictions. You can only delete the end/tail of the archive, not files at the beginning or in the middle of it.
I just had a similar need for extracting files from a huge tar (450G) without enough space for both the tar and the extracted files. I had to extract files one at a time and remove them from the .tar
as soon as they were extracted.
The command tar -vf x.tar --delete a.txt
does not solve that because it does not delete the a.txt
from the x.tar
(the x.tar
remains the same size), it just removes it from the list of contained files (a.txt
will not be extracted when untaring x.tar
later).
The only thing you can do with .tar
files, because they are sequential, is to truncate them. So the only solution is to extract files from the end.
First you get the list of all the members of the tar file:
with tarfile.open(name=tar_file_path, mode="r") as tar_file:
tar_members = tar_file.getmembers()
Then you can extract the files you want from the end:
with tarfile.open(name=tar_file_path, mode="r") as tar_file:
tar_file.extractall(path = extracting_dir, members = tar_members[first_of_files_to_extract:])
You compute where to truncate the file (in bytes):
truncate_size = tar_members[first_of_files_to_extract].offset
Then you add "end of file" marker, i.e. two consecutive blocks of Nulls. Each block is 512 bytes long in .tar
, so you need to have 1024 Null bytes at the end. Here, just for the record, you can add 512 bytes (one block) because the previous tar_member already finish by a 512 bytes Null block (marker of end of tar_member).
new_file_size = truncate_size + 1024 # 2 blocs of 512 Null bytes
And you finally do the truncations, first for removing last members, second for adding null bytes (here we do not open the .tar
with tarfile.open()
anymore, truncation is just regular file operation):
with open(tar_file_path) as tar_file:
tar_file.truncate(truncate_size)
tar_file.truncate(new_file_size)
Here you have extracted files from the end of the .tar
, and you've got a new valid .tar
file, smaller than the previous one by the size of the extracted files plus some blocks bytes, and you have limitated extra memory usage to the size of the files extracted: I personally did that file by file (extract last file, truncate, extract last file truncate etc).