I am having error while copying a slice of dask array to nparray, the number of row doesn't match
store = h5py.File(s_file_path + '.hdf5', 'r')
dset = store['data_matrix']
data_matrix = da.from_array(dset, chunks=dset.chunks)
test_set = data_matrix[482:, :]
np_test_set = np.array(test_set, order='FORTRAN')
print "source_set shape: ", data_matrix.shape
print "test_set shape: ", test_set.shape
print "np_test_set shape: ", np_test_set.shape
results:
source_set shape: (656, 473034)
test set shape: (174, 473034)
np_test_set shape: (195, 473034)
I am not very familiar with dask, I am using it because my data don't hold in RAM, is the row difference related to caching or the chunk size ?