1

I input a file using genfromtxt and some of the values are missing so I generate a masked array. When I try to index some of the values of the records of the masked array I get an error which I cannot figure out. Any help would be highly appreciated. Thanks. --Alex

import csv
import datetime
import time
import numpy as np
import numpy.lib.recfunctions as rf
import pprint
import numpy.ma as ma

date_converter = lambda x: datetime.date(int(x[-4:]), int(x[3:5]), int(x[:2]))
input_file = np.genfromtxt("../data/test.csv", usemask=True, converters={0:date_converter}, dtype="O4, i8, i8, i8, i8", names="date, firm, val1, val2, val3", delimiter=",", skip_header=1)

Generates:

masked_array(data = [(datetime.date(2001, 3, 1), 1L, --, 14L, 15L)
 (datetime.date(2001, 2, 1), 1L, 10L, 11L, 12L)
 (datetime.date(2001, 5, 1), 1L, 19L, 20L, 21L)
 (datetime.date(2001, 4, 1), 1L, 16L, --, 18L)],
             mask = [(False, False, True, False, False) (False, False, False, False, False)
 (False, False, False, False, False) (False, False, False, True, False)],
       fill_value = ('?', 999999L, 999999L, 999999L, 999999L),
            dtype = [('date', '|O4'), ('firm', '<i8'), ('val1', '<i8'), ('val2', '<i8'), ('val3', '<i8')])

When I run input_file[0] I get the following error:

Traceback (most recent call last):
  File "<pyshell#278>", line 1, in <module>
    input_file[0]
  File "C:\Python27\lib\site-packages\numpy\ma\core.py", line 2956, in __getitem__
    dout = mvoid(dout, mask=mask)
  File "C:\Python27\lib\site-packages\numpy\ma\core.py", line 5529, in __new__
    _data[()] = data
ValueError: Setting void-array with object members using buffer.
Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Alex
  • 19,533
  • 37
  • 126
  • 195
  • Did you do anything else with `input_file`? Because I just copied your output of `input_file` and put it in a masked array (with `ma.MaskedArray(..)`, and then I had no problem to extract the first row with `input_file[0]`. – joris Apr 20 '11 at 07:45
  • nope, did nothing else. could you try doing it the way i did it? that is, just throw those data items in a text file, read them in using `genfromtxt` please? it's the only thing i can think of because if i create a masked array from scratch using data (although different data) it also has no problem accessing it. – Alex Apr 20 '11 at 14:17
  • I did, but also no problem to access the first row with `input_file[0]`. The only thing I removed from your code was `skip_header`, because I have an older version of numpy I think. – joris Apr 21 '11 at 07:49
  • interesting - i am still having this problem just running the code outlined above... am still having this problem just running the code above (literally), taking out skip_header (and removing the headers from the file). I have running Python 2.7 and the latest version of Numpy... Any ideas? This is very strange. Do you have a sense of what the error is? ie what is the package trying to do? – Alex Apr 22 '11 at 00:37
  • i have tried using other masked array generated by genfortxt. when selecting elements that have missing values, i get the same error. either i am misusing this or numpy as yet another bug. – Alex Apr 22 '11 at 20:25
  • @Alex do you have an example of this `text.csv` file? – Saullo G. P. Castro Apr 26 '14 at 09:07
  • There have been numerous bugfixes to masked arrays in numpy in recent months. Do you still get an error with the latest version? – gerrit May 23 '16 at 14:09
  • @gerrit: no i switched to R long ago though might give python another shot since it has become much more mature since this question was posted – Alex May 23 '16 at 16:20

1 Answers1

0

input_file[0] is not the right way to access the data in a masked array (see documentation)

for example:

>>> import numpy as np
>>> arr = np.ma.ones(3, dtype=[('c1', np.int),('c2', np.int)])
>>> arr.mask[0][1] = True
>>> arr.data[0][0] = 2              
>>> np.ma.getdata(arr)[1][0] = 3    
>>> arr.data[2][0] = 4       
>>> print(arr)
   [(2, --) (3, 1) (4, 1)]
tetrarquis
  • 35
  • 5