As discussed in the comments, the resulting object is probably too large for your memory.
Numpy has the ability to store arrays on your disk (hopefully SSD, if you use a HDD, this will probably be too slow).
This is called a memmap.
It is possible to store datatypes such as strings in a memmap, but this can become tricky: numpy.memmap for an array of strings?
Also, it might be complicated to get the data into the memmap in the first place. You might want to split the file and load it in multiple goes. Then you can write the individual portions into the memmap one by one.
Another important point might be the dtype
. You specify None
and use many columns. Are you having different datatypes in the different columns ? If yes, you might want to switch to pandas, instead of numpy. That will give you a proper datatype for this spreadsheet like data.
Be sure to use the appropriate datatypes for every column. That can significantly reduce your memory footprint (and might already solve your problem): https://www.dataquest.io/blog/pandas-big-data/
To check the memory footprint of a numpy array, you can use nbytes
:
np.ones((10,10), dtype="float64").nbytes # 800
np.ones((10,10), dtype="int32").nbytes # 400