I want to create a Numpy kernel matrix of dimensions 25000*25000. I want to know what is the most efficient way to handle such large matrix in terms of saving it on disk and loading it. I tried dumping it with Pickle, but it threw an error saying it cannot serialize objects of size greater than 4 Gib.
Asked
Active
Viewed 292 times
0
-
try np.save or np.savez – Imtinan Azhar Mar 10 '19 at 09:20
-
Not from a lot of experience, but you might want to look at [pyarrow](https://arrow.apache.org/docs/python/numpy.html) and also at [parquet](http://parquet.apache.org/). `pyarrow` is supposed to already contains parquet. – amitr Mar 10 '19 at 09:24
2 Answers
1
u could try to save it in h5 file by pandas.HDFStore()
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.rand(25000,25000).astype('float16'))
memory_use = round(df.memory_usage(deep=True).sum()/1024*3,2)
print('use{}G'.format(memory_use))
store = pd.HDFStore('test.h5', 'w)
store['data'] = df
store.close()

Zihan Yang
- 31
- 3
1
Why not try to save the array as a file instead of using pickle
np.savetxt("filename",array)
It then can be read by
np.genfromtxt("filename")

GILO
- 2,444
- 21
- 47