0

When I use ":n" or "m:" as arguments to np.r_, I get unexpected results that I don't understand.

Here's my code

import numpy as np  
B = np.arange(180).reshape(6,30)
C = B[:, np.r_[10:15, 20:26]]
D = C[:, np.r_[0:3,8:11]]

Now all of that worked as expected. C prints as:

array([[ 10,  11,  12,  13,  14,  20,  21,  22,  23,  24,  25],
       [ 40,  41,  42,  43,  44,  50,  51,  52,  53,  54,  55],
       [ 70,  71,  72,  73,  74,  80,  81,  82,  83,  84,  85],
       [100, 101, 102, 103, 104, 110, 111, 112, 113, 114, 115],
       [130, 131, 132, 133, 134, 140, 141, 142, 143, 144, 145],
       [160, 161, 162, 163, 164, 170, 171, 172, 173, 174, 175]])

and D is:

array([[ 10,  11,  12,  23,  24,  25],
       [ 40,  41,  42,  53,  54,  55],
       [ 70,  71,  72,  83,  84,  85],
       [100, 101, 102, 113, 114, 115],
       [130, 131, 132, 143, 144, 145],
       [160, 161, 162, 173, 174, 175]])

However, when I remove the "0" and the "11," I don't understand what happens, and I haven't been able to find any explanation in any Numpy indexing or r_ documentation. Here's the new line of code:

E = C[:, np.r_[:3, 8:]]

It's just the same expression that defined the D array with "unnecessary" indices removed. However, the results are mystifying:

array([[ 10,  11,  12,  10,  11,  12,  13,  14,  20,  21,  22],
       [ 40,  41,  42,  40,  41,  42,  43,  44,  50,  51,  52],
       [ 70,  71,  72,  70,  71,  72,  73,  74,  80,  81,  82],
       [100, 101, 102, 100, 101, 102, 103, 104, 110, 111, 112],
       [130, 131, 132, 130, 131, 132, 133, 134, 140, 141, 142],
       [160, 161, 162, 160, 161, 162, 163, 164, 170, 171, 172]])

I expected E to be identical to D, with just six columns. What's going on? Is this behavior documented somewhere, or is this a bug?

  • Instead of seeing the results of the different uses of `r_` indirectly (through their use as indices), take a look at exactly what `r_` generates. In particular, compare `np.r_[:3]` with `np.r_[0:3]`, and compare `np.r_[8:11]` with `np.r_[8:]`. The latter is the one that probably needs a closer look. – Warren Weckesser Oct 26 '22 at 22:31
  • The use of slices in `np.r_` is analogous to their use in indexing, but not identical. It's actually converted to a `np.arange` call, `np.r_[8:]` has no way of deducing that you want to generate `np.arange(8,11)`. – hpaulj Oct 26 '22 at 22:40
  • Why do you think those indices are unnecessary? – hpaulj Oct 27 '22 at 04:46
  • Because they are unnecessary in standard Python indexing. No one would want to change the behavior of indexing in a package because it would lead to lots of confusion. – user2983936 Oct 28 '22 at 21:02
  • @WarrenWeckesser , wow! That's messed up! np.r_[8:] is the same as np.r_[:8]. How's that for breaking the way Python indexing works? Anyway, thanks for the pointer. Of course, I wish someone had pointed me to r_ documentation that explained this instead of having to experiment with it to see how broken it is. It's as though they didn't want to document how they screwed it up because they might have gotten pressured to fix it. – user2983936 Oct 28 '22 at 21:08
  • @hpaulj, yeah, it's too much to expect Python indexing to be consistent! – user2983936 Oct 28 '22 at 21:09
  • `np.r_` is an instance of a class defined in the `index_tricks,py` file. It is `row concatenate`, like `hstack`, with the added trick of converting `slice` notation into `arange` or `linspace` . calls. It is useful for generating `advanced` indexing arrays,, but is not an actual indexing operation. – hpaulj Oct 29 '22 at 01:01
  • 1
    I tried to explain `r_` some years ago, https://stackoverflow.com/questions/37743843/python-why-use-numpy-r-instead-of-concatenate#37751518 – hpaulj Oct 29 '22 at 14:59

2 Answers2

1

To understand the difference between D and E we have to look what the np.r_ produces. As with function calls, the 'contents' of an indexing, if complex, are evaluated first.

In [112]: D = C[:, np.r_[0:3,8:11]]; D.shape
Out[112]: (6, 6)
In [113]: E = C[:, np.r_[:3,8:]]; E.shape
Out[113]: (6, 11)

The two r_:

In [115]: np.r_[0:3,8:11]
Out[115]: array([ 0,  1,  2,  8,  9, 10])    
In [116]: np.r_[:3,8:]
Out[116]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])

r_ is an instance of a class defined in np.lib.index_tricks. That class has its own __getitem__ method, allowing us to use indexing notation, but the task is actually a call to np.concatenate.

We can see what r_ get by using another index_tricks:

In [117]: np.s_[0:3, 8:11]
Out[117]: (slice(0, 3, None), slice(8, 11, None))    
In [118]: np.s_[:3, 8:]
Out[118]: (slice(None, 3, None), slice(8, None, None))

If we define a simple function:

def foo(aslice):
    return np.arange(aslice.start, aslice.stop, aslice.step)

we can test the different slices:

In [124]: foo(np.s_[8:11])            # np.arange(8,11)
Out[124]: array([ 8,  9, 10])

In [125]: foo(np.s_[8:])              # np.arange(8)
Out[125]: array([0, 1, 2, 3, 4, 5, 6, 7])

Remember, that when we give arange just one number, it's understood to be the 'stop', with an implicit 0 start. That's the same as with python's base range.

np.r_ actually uses:

In [105]: def foo1(item):
     ...:     step = item.step
     ...:     start = item.start
     ...:     stop = item.stop
     ...:     if start is None:
     ...:         start = 0
     ...:     if step is None:
     ...:         step = 1
     ...:     return np.arange(start, stop, step)

but this just lets us use np.r_[:3] instead of np.r_[0:3]. It doesn't change the [8:] case.

In case it isn't clear. A[i,j] is translated by the interpreter into A.__getitem__((i,j)), a function call. The interpreter also converts any '::' into a slice(...) object, as illustrated by s_.

After converting the slices into arrays with np.arange or np.linspace (for 'complex' steps), it does a concatenate

So your two r_ expressions are really:

In [128]: np.concatenate([np.arange(0,3), np.arange(8,11)])    # [115]
Out[128]: array([ 0,  1,  2,  8,  9, 10])

In [129]: np.concatenate([np.arange(0,3), np.arange(8,None)])   # [116]
Out[129]: array([0, 1, 2, 0, 1, 2, 3, 4, 5, 6, 7])

I suppose one could argue that np.r_[8:] should raise an error, since it provides a start without stop, and thus can't be evaluated as it would in a real indexing case. As coded it works because of the default behavior of np.arange.

edit

When I use '8:' directly, C can deduce the correct stop from its own shape:

In [140]: C.shape
Out[140]: (6, 11)

In [141]: C[:,8:].shape
Out[141]: (6, 3)

But an np.r_ object does not have a shape, nor can it deduce the shape from C:

In [142]: np.r_.shape
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [142], in <cell line: 1>()
----> 1 np.r_.shape

AttributeError: 'RClass' object has no attribute 'shape'

If you want to avoid the explicit 11, you have use:

In [143]: C[:, np.r_[8:C.shape[1]]].shape
Out[143]: (6, 3)
hpaulj
  • 221,503
  • 14
  • 230
  • 353
0

The answer is that Numpy.r_ indexing does not work like Python indexing. For some reason, it is different, and one has to know what the last index is to get the items from n to last and use <ndarray>.r_[n:last] instead of <ndarray>.r_[n:]. IMHO, this defeats one of the better features of Python, not having to call some sort of shape or size function to get your indices correct.

  • 1
    NumPy indexing isn't "broken". If you write, say, `C[:, 8:]`, it will do what you expect. The problem with your code is that you pass the result of, for example, `r_[:3, 8:]` as the index, and `r_[]` doesn't act like you expected it to. So the problem is really one of understanding what `r_` does. In particular, as @hpaulj noted in a comment `r_[start:stop:step]` acts like `np.arange(start, stop, step)`, so `r_[8:]` acts like `np.arange(8)`, which generates `array([0, 1, 2, 3, 4, 5, 6, 7])`. – Warren Weckesser Oct 28 '22 at 22:29
  • I have changed "broken" to "different." Thanks for your patience. – user2983936 Oct 29 '22 at 12:29
  • 1
    Sorry to be so picky, but the statement in your answer is still not really correct. The problem you had was with the use of `r_[]`. If you index a NumPy array *directly* with a slice such as `n:`, it works as you think it should--you do not have to know the last element. – Warren Weckesser Oct 29 '22 at 14:18