41

Is there a way to write a string directly to a tarfile? From http://docs.python.org/library/tarfile.html it looks like only files already written to the file system can be added.

gatoatigrado
  • 16,580
  • 18
  • 81
  • 143

7 Answers7

38

I would say it's possible, by playing with TarInfo e TarFile.addfile passing a StringIO as a fileobject.

Very rough, but works

import tarfile
import StringIO

tar = tarfile.TarFile("test.tar","w")

string = StringIO.StringIO()
string.write("hello")
string.seek(0)
info = tarfile.TarInfo(name="foo")
info.size=len(string.buf)
tar.addfile(tarinfo=info, fileobj=string)

tar.close()
Stefano Borini
  • 138,652
  • 96
  • 297
  • 431
  • 2
    You can just say StringIO.StringIO("hello") to replace the writing and seeking. – mpenkov Mar 02 '14 at 01:13
  • is the procedure similar to python3 and bytesIO objects? – proteneer May 18 '14 at 07:43
  • 2
    @proteneer: I believe in python 3 the seek method gives you a binary length, while it internally uses the string ```len()``` function, so that ```tarfile.copyfileobj``` function will fail with ```raise OSError("end of file reached")``` – luckydonald Dec 02 '15 at 16:24
16

As Stefano pointed out, you can use TarFile.addfile and StringIO.

import tarfile, StringIO

data = 'hello, world!'

tarinfo = tarfile.TarInfo('test.txt')
tarinfo.size = len(data)

tar = tarfile.open('test.tar', 'a')
tar.addfile(tarinfo, StringIO.StringIO(data))
tar.close()

You'll probably want to fill other fields of tarinfo (e.g. mtime, uname etc.) as well.

avakar
  • 32,009
  • 9
  • 68
  • 103
  • is the "As Stefano pointed out" an edit? Otherwise, I don't see what you're doing differently. Thanks for the response all the same. – gatoatigrado Apr 27 '09 at 19:44
  • I think Stefano haven't had any code posted at the time I wrote my response, he only noted that TarFile.addfile and StringIO can be used. My memory is little blurred, though. – avakar Apr 27 '09 at 20:57
  • FWIW, yes, @Stefano's detailed information was added in [an edit](http://stackoverflow.com/posts/740839/revisions) after you wrote this. The other answer saying the same thing also came in almost simultaneously. – mattdm Feb 27 '12 at 16:24
11

I found this looking how to serve in Django a just created in memory .tgz archive, may be somebody else will find my code usefull:

import tarfile
from io import BytesIO


def serve_file(request):
    out = BytesIO()
    tar = tarfile.open(mode = "w:gz", fileobj = out)
    data = 'lala'.encode('utf-8')
    file = BytesIO(data)
    info = tarfile.TarInfo(name="1.txt")
    info.size = len(data)
    tar.addfile(tarinfo=info, fileobj=file)
    tar.close()

    response = HttpResponse(out.getvalue(), content_type='application/tgz')
    response['Content-Disposition'] = 'attachment; filename=myfile.tgz'
    return response
scythargon
  • 3,363
  • 3
  • 32
  • 62
6

The solution in Python 3 uses io.BytesIO. Be sure to set TarInfo.size to the length of the bytes, not the length of the string.

Given a single string, the simplest solution is to call .encode() on it to obtain bytes. In this day and age you probably want UTF-8, but if the recipient is expecting a specific encoding, such as ASCII (i.e. no multi-byte characters), then use that instead.

import io
import tarfile

data = 'hello\n'.encode('utf8')
info = tarfile.TarInfo(name='foo.txt')
info.size = len(data)

with tarfile.TarFile('test.tar', 'w') as tar:
    tar.addfile(info, io.BytesIO(data))

If you really need a writable string buffer, similar to the accepted answer by @Stefano Borini for Python 2, then the solution is to use io.TextIOWrapper over an underlying io.BytesIO buffer.

import io
import tarfile

textIO = io.TextIOWrapper(io.BytesIO(), encoding='utf8')
textIO.write('hello\n')
bytesIO = textIO.detach()
info = tarfile.TarInfo(name='foo.txt')
info.size = bytesIO.tell()

with tarfile.TarFile('test.tar', 'w') as tar:
    bytesIO.seek(0)
    tar.addfile(info, bytesIO)
Todd Owen
  • 15,650
  • 7
  • 54
  • 52
4

Just for the record:
StringIO objects have a .len property.
No need to seek(0) and do len(foo.buf)
No need to keep the entire string around to do len() on, or God forbid, do the accounting yourself.

( Maybe it did not at the time the OP was written. )

Alias_Knagg
  • 886
  • 1
  • 7
  • 21
  • `StringIO` objects do not have a `len` property. The code `StringIO('foo').len` raises an exception `AttributeError: '_io.StringIO' object has no attribute 'len'` in Python 3.8. (Maybe it did not at the time the answer was written.) – Jeyekomon Apr 19 '22 at 13:01
  • Apparently its undocumented but present in StringIO in 2.7 (but not cStringIO) https://stackoverflow.com/questions/4677433/in-python-how-do-i-check-the-size-of-a-stringio-object – Alias_Knagg Apr 19 '22 at 13:38
3

In my case I wanted to read from an existing tar file, append some data to the contents, and write it to a new file. Something like:

for ti in tar_in:
    buf_in = tar.extractfile(ti)
    buf_out = io.BytesIO()
    size = buf_out.write(buf_in.read())
    size += buf_out.write(other data)
    buf_out.seek(0)
    ti.size = size
    tar_out.addfile(ti, fileobj=buf_out)

Extra code is needed for handling directories and links.

z0r
  • 8,185
  • 4
  • 64
  • 83
2

You have to use TarInfo objects and the addfile method instead of the usual add method:

from StringIO import StringIO
from tarfile import open, TarInfo

s = "Hello World!"
ti = TarInfo("test.txt")
ti.size = len(s)

tf = open("testtar.tar", "w")
tf.addfile(ti, StringIO(s))
Eli Courtwright
  • 186,300
  • 67
  • 213
  • 256