2

I have a very large data file (1000 by 1400000 array) that contains integers of 0, 1, 2 and 4. It takes a very long time to load this big data into a numpy array using h5py because my memory(4GB) cannot hold that much and the program uses the swap space. Since there are only 4 numbers in the data, I want to use a 8 bit integer array. Currently I load the data and convert it to a 8 bit int array after that.

with h5py.File("largedata", 'r') as f:

    variables = f.items()
    # extract all data
    for name, data in variables:
        # If DataSet pull the associated Data
        if type(data) is h5py.Dataset:
            value = data.value
            if(name == 'foo'):
                # convert to 8 bit int
                nparray = np.array(value, dtype=np.int8)

Is it possible to load the data directly into a 8bit int array to save memory while loading?

takasoft
  • 1,228
  • 15
  • 15

1 Answers1

2

From the dataset docs page

 astype(dtype)

 Return a context manager allowing you to read data as a particular type. 
 Conversion is handled by HDF5 directly, on the fly:

>>> dset = f.create_dataset("bigint", (1000,), dtype='int64') 
>>> with dset.astype('int16'): 
      out = dset[:] 
>>> out.dtype 
       =dtype('int16')
hpaulj
  • 221,503
  • 14
  • 230
  • 353