3

I have seen a phenomenon recently in working with structured numpy arrays that doesn't make sense. I am hoping someone can help me understand what is going on. I have provided a minimal working example to illustrate the problem. The problem is this:

When indexing a structured numpy array with a boolean mask, this works:

arr['fieldName'][boolMask] += val

but the following does not:

arr[boolMask]['fieldName'] += val

Here is a minimal working example:

import numpy as np

myDtype = np.dtype([('t','<f8'),('p','<f8',(3,)),('v','<f4',(3,))])

nominalArray = np.zeros((10,),dtype=myDtype)
nominalArray['t'] = np.arange(10.)
# In real life, the other fields would also be populated
print "original times: {0}".format(nominalArray['t'])

# Add 10 to all times greater than 5
timeGreaterThan5 = nominalArray['t'] > 5
nominalArray['t'][timeGreaterThan5] += 10.
print "times after first operation: {0}".format(nominalArray['t'])

# Return those times to their original values
nominalArray[timeGreaterThan5]['t'] -= 10.
print "times after second operation: {0}".format(nominalArray['t'])

Running this yields the following output:

original times: [ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
times after first operation: [  0.   1.   2.   3.   4.   5.  16.  17.  18.  19.]
times after second operation: [  0.   1.   2.   3.   4.   5.  16.  17.  18.  19.]

We clearly see here that the second operation had no effect. If somebody could explain why this occurs, it would be greatly appreciated.

lmiguelvargasf
  • 63,191
  • 45
  • 217
  • 228
tintedFrantic
  • 177
  • 1
  • 9
  • wouldn't it be `nominalArray["t"][timeGreaterThan5] -= 10` – Padraic Cunningham Jul 03 '15 at 17:11
  • @PadraicCunningham That would fix the problem. I am just wondering why that is the solution. What is so special about the ordering here? – tintedFrantic Jul 03 '15 at 17:21
  • 3
    I get you now, they both return different objects, one is a view and the other is not, I imagine advanced vs normal indexing, `nominalArray[timeGreaterThan5]` returns `[(16.0, [0.0, 0.0, 0.0], [0.0, 0.0, 0.0])...` so the slice returned from `nominalArray[timeGreaterThan5]` is not a view object – Padraic Cunningham Jul 03 '15 at 17:24
  • Interesting. That makes sense. Thank you for your help. – tintedFrantic Jul 04 '15 at 13:17

1 Answers1

3

This is indeed an issue of copy v view. But I'll go into more detail.

The key distinction between a view v a copy is - is the indexing pattern regular or not. A regular one can be expressed in terms of the array shape, strides, and dtype. In general, a boolean index (and the related list of indexes) cannot be expressed in those terms, so numpy has to return a copy.

I like to look at the arr.__array_interface__ property. It shows the shape, strides, and a pointer to the data buffer. If the pointer is the same as with the original, it is a view.

With arr[idx] += 1, the indexing is actually a setitem method, selecting which data buffer items to modify with the addition. The distinction between view and copy doesn't apply.

But with arr[idx1][idx2] += 1, the first indexing is a getitem method. For that the distinction between view and copy matters. The 2nd indexing modifies the array produced by the 1st. If it is a view, the modification affects the original data; if a copy, nothing permanent happends. The copy may be modified, but it disappears down the garbage collection shute.

With 2d arrays, you can combine these 2 indexing steps, arr[idx1, idx2] += 1; and in fact that is the preferred syntax.

With structured arrays, field indexing is similar to column indexing, but not quite the same. For one thing, it can't be combined with the element indexing.

A simple structured array:

In [234]: arr=np.ones((5,),dtype='i,f,i,f')
In [235]: arr.__array_interface__
{'strides': None,
 'shape': (5,),
 'data': (152524816, False),
 'descr': [('f0', '<i4'), ('f1', '<f4'), ('f2', '<i4'), ('f3', '<f4')],
 'typestr': '|V16',
 'version': 3}

Selecting one field produces a view - same data pointer

In [236]: arr['f0'].__array_interface__['data']
Out[236]: (152524816, False)

Selecting elements with boolean produces a copy (diff pointer)

In [242]: idx = np.array([1,0,0,1,1],bool)
In [243]: arr[idx].__array_interface__['data']
Out[243]: (152629520, False)

So arr['f0'][idx] += 1 modifies selected items from the f0 field.

arr[idx]['f0'] += 1 modifies the f0 field of a copy, with no effect on arr.

arr[idx]['f0'] + 1 and arr['f0'][idx] + 1 display the same thing, but they aren't trying to perform any in-place changes.

You can select mutiple fields from a structured array, arr[['f0','f2']]. But this is a copy. (and I get a warning suggesting I make an explicit copy).

hpaulj
  • 221,503
  • 14
  • 230
  • 353