1

Question

Why are the numpy tuple indexing behaviors inconsistent? Please explain the rational or design decision behind these behaviors. In my understanding, Z[(0,2)] and Z[(0, 2), (0)] are both tuple indexing and expected the consistent behavior for copy/view. If this is incorrect, please explain,

import numpy as np
Z = np.arange(36).reshape(3, 3, 4)
print("Z is \n{}\n".format(Z))

b =  Z[
    (0,2)      # Select Z[0][2]
]
print("Tuple indexing Z[(0,2)] is \n{}\nIs view? {}\n".format(
    b,
    b.base is not None
))

c = Z[         # Select Z[0][0][1] & Z[0][2][1]
    (0,2),
    (0)
]
print("Tuple indexing Z[(0, 2), (0)] is \n{}\nIs view? {}\n".format(
    c,
    c.base is not None
))
Z is 
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]

 [[24 25 26 27]
  [28 29 30 31]
  [32 33 34 35]]]

Tuple indexing Z[(0,2)] is 
[ 8  9 10 11]
Is view? True

Tuple indexing Z[(0, 2), (0)] is 
[[ 0  1  2  3]
 [24 25 26 27]]
Is view? False

Numpy indexing is confusing and wonder how people built the understanding. If there is a good way to understand or cheat-sheets, please advise.

mon
  • 18,789
  • 22
  • 112
  • 205

1 Answers1

1

It's the comma that creates a tuple. The () just set boundaries where needed.

Thus

Z[(0,2)]
Z[0,2]

are the same, select on the first 2 dimension. Whether that returns an element, or an array depends on how many dimensions Z has.

The same interpretation applies to the other case.

Z[(0, 2), (0)]
Z[( np.array([0,2]), 0)]
Z[ np.array([0,2]), 0]

are the same - the first dimensions is indexed with a list/array, and thus is advanced indexing. It's a copy.

[ 8  9 10 11]

is a row of the 3d array; its a contiguous block of Z

[[ 0  1  2  3]
 [24 25 26 27]]

is 2 rows from Z. They aren't contiguous, so there's no way of identifying them with just shape and strides (and offset in the databuffer).

details

__array_interface__ gives details about the underlying data of an array

In [146]: Z = np.arange(36).reshape(3,3,4)
In [147]: Z.__array_interface__
Out[147]: 
{'data': (38255712, False),
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (3, 3, 4),
 'version': 3}
In [148]: Z.strides
Out[148]: (96, 32, 8)

For the view:

In [149]: Z1 = Z[0,2]
In [150]: Z1
Out[150]: array([ 8,  9, 10, 11])
In [151]: Z1.__array_interface__
Out[151]: 
{'data': (38255776, False),    # 38255712+8*8
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (4,),
 'version': 3}

The data buffer pointer is 8 elements further along in Z buffer. Shape is much reduced.

In [152]: Z2 = Z[[0,2],0]
In [153]: Z2
Out[153]: 
array([[ 0,  1,  2,  3],
       [24, 25, 26, 27]])
In [154]: Z2.__array_interface__
Out[154]: 
{'data': (31443104, False),     # an entirely different location
 'strides': None,
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (2, 4),
 'version': 3}

Z2 is the same as two selections:

In [158]: Z[0,0]
Out[158]: array([0, 1, 2, 3])
In [159]: Z[2,0]
Out[159]: array([24, 25, 26, 27])

It is not

Z[0][0][1] & Z[0][2][1]
Z[0,0,1] & Z[0,2,1]

Compare that with a 2 row slice:

In [156]: Z3 = Z[0:2,0]
In [157]: Z3.__array_interface__
Out[157]: 
{'data': (38255712, False),   # same as Z's
 'strides': (96, 8),
 'descr': [('', '<i8')],
 'typestr': '<i8',
 'shape': (2, 4),
 'version': 3}

A view is returned if the new array can be described with shape, strides and all or part of the original data buffer.

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • Thanks a lot for the answer. However I am afraid I still not get the point of "A view is returned if the new array can be described with shape, strides and all or part of the original data buffer" part. Opened https://stackoverflow.com/questions/65501307 if you can kindly have a look at. – mon Dec 30 '20 at 02:33