2

I am looking for a way in which I can pickle some Python objects into a combined tar archive. Further I also need to use np.save(....) to save some numpy arrays in yet the same archive. Of corse, I also need to read them later.

So what I tried is

a = np.linspace(1,10,10000)    
tar = tarfile.open(fileName, "w")
tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))
tar.close()

and I get the error:

'numpy.ndarray' object has no attribute 'write'

Simlar problems I get if I pickle an object in the tar-file. Any suggestions? If it is easier, json-pickle would also work.

EDIT: as mentioned in the comments I confused the arguments of np.save(). However, this does not solve the issue, as now I get the error:

object of type 'NoneType' has no len()

EDIT 2: If there is no solution to the above problem, do you know of any other way of time efficiently boundle files?

  • 1
    For one thing, the arguments to [`np.save`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.save.html) are the wrong way around. The first one needs to be either an open file object or a string, but you're giving it `a`, which is an `np.ndarray`. – ali_m Aug 18 '15 at 13:52
  • Thanks! I editted the question –  Aug 18 '15 at 14:10
  • 1
    `np.save` returns `None`. – Warren Weckesser Aug 18 '15 at 14:16
  • Okay, that basically means, there is no way to get it done? –  Aug 18 '15 at 14:17
  • 1
    I'm not sure that there is a way to write "directly" to a `tar` file, although you could certainly save the array to an intermediate file, then add this to the archive (i.e. using `tar.add()`). – ali_m Aug 18 '15 at 14:27
  • Saving the files is the week point, as it really consumes a lot of time...then I would even have to do it twice :( –  Aug 18 '15 at 14:28
  • 1
    [`np.savez`](http://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) can be used to save multiple numpy arrays within a single zipped archive. It will also accept arbitrary Python objects, which will be cast to arrays of dtype `np.object` and therefore pickled. – ali_m Aug 18 '15 at 14:32
  • that sounds very promissing ... I will have a look into that! Thanks! –  Aug 18 '15 at 14:34
  • If I do that with an object of a complex class and subsequently load it, it is no longer of the type of the former class. –  Aug 18 '15 at 14:49
  • Yes, it will be an array of dtype `np.object` that contains your class instance. Try accessing the first element of the array. – ali_m Aug 18 '15 at 14:52
  • This is the string "arr_0" ;) –  Aug 18 '15 at 14:52
  • You will have to be more specific. Could you edit your question to show the actual object you are trying to store? – ali_m Aug 18 '15 at 14:54

1 Answers1

4

First, I'm not a expert tar user, but I can point out a couple of things:

 a = np.linspace(1,10,10000)    

 tar = tarfile.open(fileName, "w")

If you want to add a file to an existing file, use the "a" mode (or study the available modes). "w" creates a new blank file:

 tarinfo = tarfile.TarInfo.frombuf(np.save(a, fileName))

The correct use of np.save has already been mentioned.

A TarInfo object is not the file/data, but rather information about the file. That information is placed in the tar file before the data, in a 512 byte buffer. tobuf creates such a buffer from the attributes of the object. frombuf decodes such a buffer. It is used, for example in the fromtarfile method:

def fromtarfile(cls, tarfile):
    """Return the next TarInfo object from TarFile object
       tarfile.
    """
    buf = tarfile.fileobj.read(BLOCKSIZE)
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
    obj.offset = tarfile.fileobj.tell() - BLOCKSIZE
    return obj._proc_member(tarfile)

So clearly frombuf is not what you want to use here.

A 2009 SO question - python write string directly to tarfile - shows that it is possible to write directly to a tarfile by using a string buffer. From the accepted answer:

# create a `StringIO` object, and fill it
string = StringIO.StringIO()
...
# create `TarInfo` object:
info = tarfile.TarInfo(name="foo")
info.size=len(string.buf)
# use both with `addfile`:
tar.addfile(tarinfo=info, fileobj=string)

I think you can do a np.save to StringIO buffer, but I'd have to check/test to be sure. For ordinary arrays, save writes a header with size, shape, dtype info, and then adds the array's data buffer. For other objects and array it resorts to pickle.

I'd suggest getting a regular np.save to file, followed by addfile working. Then see if writing to a string buffer works and whether it saves any time.


Here's a test script. It writes one array to a tar file, closes and reopens the file and writes another, and finally it extracts the files and loads them. Returned shapes look fine. I haven't looked at whether it is possible to extract these files to memory buffers or not.

np.savez could do the same thing zip archiving (rather than tar).

import numpy as np
import tarfile

import io   # python3 version
abuf = io.BytesIO()

np.save(abuf, np.arange(100))
abuf.seek(0)

tar=tarfile.TarFile('test.tar','w')
info= tarfile.TarInfo(name='anArray')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()

abuf = io.BytesIO()
np.save(abuf, np.ones((2,3,4)))
abuf.seek(0)

tar=tarfile.TarFile('test.tar','a')
info= tarfile.TarInfo(name='anOther')
info.size=len(abuf.getbuffer())
tar.addfile(tarinfo=info, fileobj=abuf)
tar.close()

tar=tarfile.TarFile('test.tar','r')
print(tar.getnames())
tar.extractall()
# can I extract to buffers?
tar.close()
a=np.load('anArray')
b=np.load('anOther')
print(a.shape, b.shape)

also

1415:~/mypy$ tar -tvf test.tar 
-rw-r--r-- 0/0             480 1969-12-31 16:00 anArray 
-rw-r--r-- 0/0             272 1969-12-31 16:00 anOther
Community
  • 1
  • 1
hpaulj
  • 221,503
  • 14
  • 230
  • 353