numpy masked array behaves differently when one dtype is an object

Question

say that I have the following two masked arrays declarations:

arr1 = ma.array([(1,2,"hello"),(10,20,"world!")],dtype=[("p1",int),("p2",float),("p3",object)])
arr1.mask["p1"][0] = True
arr1.mask["p2"][1] = True

arr2 = ma.array([(1,2,3),(10,20,30)],dtype=[("p1",int),("p2",float),("p3",int)])
arr2.mask["p1"][0] = True
arr2.mask["p2"][1] = True

as you can see the only (slight ?) difference is that the "p3" field is an object for arr1 and an int for arr2.

Calling arr2[0] is OK and gives (--, 2.0, 3).

However, when masking some elements of arr1, calling arr1[0] gives the following error:

*** ValueError: Setting void-array with object members using buffer.

Clearly, declaring one field as an object triggered some troubles but I have no single idea why.

What do you think about this and would you see some ways to circumvent that problem, keeping in mind that I would really need to access 'arr1[0]' on that way ?

thanks a lot

Eric

EDIT: this problem occurs with numpy version < 1.8. I tried with the latest version (1.8) and it is OK.

Saullo G. P. Castro · Answer 1 · 2013-10-31T14:02:54.893

2

When you create an array like this, giving names for each dtype field, you are actually creating a np.recarray or an array whose fields are accessed using attributes.

So, to access the first field of arr1 you should do:

arr1['p1']
#masked_array(data = [-- 10],
#             mask = [ True False],
#       fill_value = 999999)

instead of arr1[0].

EDIT: The 2-D solution would be something like:

b1m = np.array([[True, False, False],[False, True, False]])
b1 = np.ma.array([[1, 2, 'hello'],
                  [10, 20, 'world!']], mask=b1m, dtype=object)
b2m = np.array([[True, False, False],[False, True, False]])
b2 = np.ma.array([[1, 2, 3],
                  [10, 20, 30]], mask=b2m, dtype=object)

edited Oct 31 '13 at 14:02

answered Oct 31 '13 at 12:37

Saullo G. P. Castro

56,802
26
179
234

thanks for the feedback. However, I still do not understand why my construction failed when one of the field is an object. Moreover, in the context of my project I would really need a row access to my masked array by `arr1[0]` – Eurydice Oct 31 '13 at 13:27
@PellegriniEric the problem is that the `recarray` is not a 2-D array, so you cannot access the items like `a[i,j]`, it is a kind of group of 1-D arrays, each one called by its field name. You could redesign your code to work with 2-D arrays though... – Saullo G. P. Castro Oct 31 '13 at 13:30
what would be an alternative design for the example given above keeping in mind that I would like to keep the masked array feature that allow to handle on a very general way the problem of missing/invalid value in my database ? – Eurydice Oct 31 '13 at 13:41
@PellegriniEric I've updated the answer with one possibility for the 2-D array solution... – Saullo G. P. Castro Oct 31 '13 at 14:03
1

thanks. Unfortunately, this design does not fit my problem because it will consider everything to be an object making hard to track invalid entries. That's why I was looking for some ways to declare an array with several dtypes without assigning a label to each of them – Eurydice Oct 31 '13 at 14:44

score 0 · Answer 2 · answered Nov 01 '13 at 13:33

0

I found that this problem occurs with numpy version < 1.8. I tried with the latest version (1.8) and it is OK. So I guess that I have to live with that ...

Thanks for your help.

answered Nov 01 '13 at 13:33

Eurydice

8,001
4
24
37

numpy masked array behaves differently when one dtype is an object

2 Answers2