0

I need to save a really big array (is a matrix of doubles, with size of 5e5 x 3e4.

The context is: I have a 1D simulation of viscous disc, each row is a snapshot of the simulation ( the surface density).

all the data is relevant (more or less), so in principle i cannot reduce the size of the matrix. I tried using np.save and h5py. with that, a matrix of 5e4x1.5e3 has a size of 6 gb in the disc. h5py is faster than np.save writing it, but I dont know if that will be the case for the full simulation (which should be something like 110 GB. is there a way to store the data in less space? or, is there another way to save and load the data that can be faster than the other two methods?

Thanks!

Ardemion
  • 33
  • 1
  • 7
  • Is your data dense or are zeros a possibility? – Grr May 01 '17 at 15:08
  • there are some zeros, but they are easily less tan 20% of the data – Ardemion May 01 '17 at 16:35
  • Well I would say sparse matrices will not help you in that case. You've said no reduction so SVD, NMF, PCA, etc go right out the window. So I really don't know how else you are going to save space. One obvious and somewhat unrelated concern would be that trying to operate on something so large your system should probably have at least that much memory. – Grr May 01 '17 at 17:45
  • Is float really not sufficient for your results? Is the data compressible? Do you wan't to save the snapshots directly when they are callculated, or all the data at the end of the simulation? And the most important question, how do you wan't to read your data afterwards (only subsets of a given shape?) – max9111 May 06 '17 at 18:59
  • Floats are enough, I don't know what you mean with compressible. I can save them directly now . I would like to read some rows individually, depending of what part of the simulation is relevant. I don't need all the snapshots, but I cannot truly know a priori which one to save and which no. Thanks – Ardemion May 07 '17 at 03:59

0 Answers0