(Python) numpy genfromtxt convert problem

Question

i use

netdata = num.genfromtxt('resultscut.rw', dtype=None, delimiter = '|', usecols=(0,1,2,3,4))

to generate a list out of a text data file. This works really nice but when i put a bigger data file to convert i get this error:

  File "/home/.local/lib/python2.7/site-packages/numpy/lib/npyio.py", line 2047, in genfromtxt
    for (i, conv) in enumerate(converters)]))
MemoryError

Is it too big for genfromtxt? How can i fix it?

Thank you in advance, Greetings :)

Be specific. How big is the file that breaks the program? How big is the file that does not? — DYZ, Nov 04 '18 at 22:32

lhk · Answer 1 · 2018-11-06T12:02:54.937

As discussed in the comments, the resulting object is probably too large for your memory.

Numpy has the ability to store arrays on your disk (hopefully SSD, if you use a HDD, this will probably be too slow). This is called a memmap.

It is possible to store datatypes such as strings in a memmap, but this can become tricky: numpy.memmap for an array of strings?

Also, it might be complicated to get the data into the memmap in the first place. You might want to split the file and load it in multiple goes. Then you can write the individual portions into the memmap one by one.

Another important point might be the dtype. You specify None and use many columns. Are you having different datatypes in the different columns ? If yes, you might want to switch to pandas, instead of numpy. That will give you a proper datatype for this spreadsheet like data. Be sure to use the appropriate datatypes for every column. That can significantly reduce your memory footprint (and might already solve your problem): https://www.dataquest.io/blog/pandas-big-data/

To check the memory footprint of a numpy array, you can use nbytes:

np.ones((10,10), dtype="float64").nbytes # 800
np.ones((10,10), dtype="int32").nbytes # 400

Yeah i have strings and integer, but i want to have all as strings in the resulting array. — QWERASDFYXCV, Nov 04 '18 at 22:53
I'm not sure how numpy handles the dtype None, I would assume that it converts everything to object. If you specify str as a dtype, it should automatically convert the numbers. This might already be more space efficient. — lhk, Nov 05 '18 at 10:50
Ok, thank you. It works with dtype=str aswell, altho it kills the program if the input file is getting too big. But i have the feelings it's faster now. — QWERASDFYXCV, Nov 05 '18 at 18:53

(Python) numpy genfromtxt convert problem

1 Answers1