1

So i'm trying to add 2 numpy masked arrays together. The difficulty is that they have to be added as strings because im trying to get a binary code in the resulting output array. The code below is a simplified version of what i'm trying to do. The mask for both arrays will be the same (In practice these would be way larger arrays, but the idea is the same):

a = np.zeros((3,3))
b = np.ones((3,3))
amask = [[False,True,True],[True, True, False],[False, False , True]]
bmask = [[False,True,True],[True, True, False],[False, False , True]]

a = a.astype('str')
b= b.astype('str')

am = ma.masked_array(a,mask = amask)
bm = ma.masked_array(b, mask = bmask)
x = np.add(am,bm)

I would like the output to be something like :

[['01' -- --],[-- -- '01'],['01', '01' --]]

So it's very important for it to be strings, so they can be added as such.

Running this code however gives me the following error:

numpy.core._exceptions.UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U32'), dtype('<U32')) -> None

Which I don't understand since both arrays clearly have the same datatypes in my opinion. Adding them without the string conversion works just fine but doesn't give me the required output. I have run into this error before and tried to look it up but never really understood it. Thanks

  • 1
    `numpy` doesn't implement operators for flexible types. `a = np.zeros((3,3), int).astype(str).astype(object)` for both arrays works. Notice the subtle difference between `np.ma.add(am, bm).data` vs `np.add(am, bm).data` – Michael Szczesny Apr 18 '22 at 12:39
  • Works, thanks! I don't know to much yet about datatypes but why exactly do you first need to convert to str and then to object ? why not directly to object? – JacksonFreeman Apr 18 '22 at 13:53
  • Try without `.astype(str)` to see for yourself. Or try `np.zeros((3,3), str).astype(object)`. – Michael Szczesny Apr 18 '22 at 14:09

1 Answers1

0

This isn't a masked array issue; it's a string dtype one.

In [254]: a = np.arange(4)
In [255]: a
Out[255]: array([0, 1, 2, 3])
In [256]: a+a
Out[256]: array([0, 2, 4, 6])
In [257]: a1 = a.astype(str)
In [258]: a1
Out[258]: array(['0', '1', '2', '3'], dtype='<U21')
In [259]: a1 + a1
Traceback (most recent call last):
  Input In [259] in <cell line: 1>
    a1 + a1
UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U21'), dtype('<U21')) -> None

astype(str) makes an array with a numpy string dtype; this is optimized for array storage, but is not the same as Python strings. np.char has some functions that can apply string methods to Un dtypes:

In [260]: np.char.add(a1,a1)
Out[260]: array(['00', '11', '22', '33'], dtype='<U42')

Or as commented, you can make a list like array of string objects:

In [261]: a2 = a1.astype(object)
In [262]: a2
Out[262]: array(['0', '1', '2', '3'], dtype=object)
In [263]: a2 + a2
Out[263]: array(['00', '11', '22', '33'], dtype=object)

For object dtype arrays, operators like + delegate the action to the methods of the elements. Equivalently:

In [264]: [i+j for i,j in zip(a2,a2)]
Out[264]: ['00', '11', '22', '33']

I expect [264] to be fastest. numpy doesn't add much to string processing.

hpaulj
  • 221,503
  • 14
  • 230
  • 353