Im trying to read and write tar.gz files from memory using python. I have read over the relevant python docs and have come up with the following minimum working example to demonstrate my issue.
text = "This is a test."
file_name = "test.txt"
text_buffer = io.BytesIO()
text_buffer.write(text.encode(encoding="utf-8"))
tar_buffer = io.BytesIO()
# Start a tar file with the memory buffer as the "file".
with tarfile.open(fileobj=tar_buffer, mode="w:gz") as archive:
# We must create a TarInfo object for each file we put into the tar file.
info = tarfile.TarInfo(file_name)
text_buffer.seek(0, io.SEEK_END)
info.size = text_buffer.tell()
# We have to reset the data frame buffer as tarfile.addfile doesn't do this for us.
text_buffer.seek(0, io.SEEK_SET)
# Add the text to the tarfile.
archive.addfile(info, text_buffer)
with open("test.tar.gz", "wb") as f:
f.write(tar_buffer.getvalue())
# The following command works fine.
# tar -zxvf test.tar.gz
archive_contents = dict()
# Start a tar file with the memory buffer as the "file".
with tarfile.open(fileobj=tar_buffer, mode="r:*") as archive:
for entry in archive:
entry_fd = archive.extractfile(entry.name)
archive_contents[entry.name] = entry_fd.read().decode("utf-8")
The odd thing is that extracting the archive with the tar
command works completely fine. I see a file test.txt
containing the string This is a test.
.
However for entry in archive
immediately finishes as it seems there are no files in the archive. archive.getmembers()
returns an empty list.
One other odd issue is when I set mode="r:gz"
when opening the byte stream I get the following exception
Exception has occurred: ReadError
empty file
tarfile.EmptyHeaderError: empty header
During handling of the above exception, another exception occurred:
File ".../test.py", line 283, in <module>
with tarfile.open(fileobj=tar_buffer, mode="r:gz") as archive:
tarfile.ReadError: empty file
I have also tried creating a test.tar.gz
file using the tar
command (assuming that they may be some issue in the way I was writing the tar file), but I get the same exception.
I must be missing something basic, but I can't seem to find any examples of this online.