[Note: although there are already some posts about dealing with large matrices in numpy, they do not address my specific concerns.]
I am trying to load a 30820x12801 matrix stored in a .txt file of size 1.02G with numpy.loadtxt()
. I get a Memory error
.
This wouldn't be so surprising except that:
- I am using 64bit Python.
- I am running the job in a supercomputer with 50G of virtual memory allocated for it.
From what I know, a 1G matrix shouldn't be a problem for 64bit Python, and certainly shouldn't be a problem for 50G of RAM.
(This is the first time I am dealing with large datasets so I may be missing something basic).
Extra information:
- When using open() the file loads into Python without any problems.
- Output of
ulimit -a | grep "max memory size
: '(kbytes, -m) unlimited' - Full error message:
Traceback (most recent call last):
File "jPCA/jPCA_pipeline.py", line 87, in <module>
MATRIX = get_matrix(new_file_prefix, N)
File "jPCA/jPCA_pipeline.py", line 70, in get_matrix
MATRIX = np.loadtxt('{}_N={}.txt'.format(new_file_prefix, N))
File "/home/hers_en/fsimoes/miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1159, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/home/hers_en/fsimoes/miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1087, in read_data
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/home/hers_en/fsimoes/miniconda3/lib/python3.7/site-packages/numpy/lib/npyio.py", line 1087, in <listcomp>
items = [conv(val) for (conv, val) in zip(converters, vals)]
MemoryError