Using python range objects to index into numpy arrays

Question

I've seen it once or twice before, but I can't seem to find any official docs on it: Using python range objects as indices in numpy.

import numpy as np
a = np.arange(9).reshape(3,3)
a[range(3), range(2,-1,-1)]
# array([2, 4, 6])

Let's trigger an index error just to confirm that ranges are not in the official range (pun intended) of legal indexing methods:

a['x']

# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Now, a slight divergence between numpy and its docs is not entirely unheard of and does not necessarily indicate that a feature is not intended (see for example here).

So, does anybody know why this works at all? And if it is an intended feature what are the exact semantics / what is it good for? And are there any ND generalizations?

I've never seen this; is it used in any reputable libraries? — roganjosh, Nov 02 '18 at 17:34
numpy predates Python 3. In Python 2, `range(3)` is a list of integers, which numpy treats as "array-like". It would have been a mess if numpy didn't also handle that in a backwards compatible way in Python 3. — Warren Weckesser, Nov 02 '18 at 18:13
*"So, does anybody know why this works at all?"* It is a nice feature, informally called "fancy" indexing, and in the docs it is called [advanced indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing). — Warren Weckesser, Nov 02 '18 at 18:19
@WarrenWeckesser That's right, it says there `(...) a non-tuple sequence object` (although a `non-tuple sequence (such as a list) containing slice objects` will trigger basic indexing, it seems). Not sure why it should hang if `IndexError` is not raised though, but whatever. I think you could make this an answer. — jdehesa, Nov 02 '18 at 18:30
It could be that indexing tries `np.asarray(x)` with works with both `range(3)` and `[0,1,2]`. Other things produce errors or object dtype arrays. @WarrenWeckesser, makes a good point about compatibility with Py2's version of `range`. — hpaulj, Nov 02 '18 at 18:36
@WarrenWeckesser Hm, good point. I always forget that `range`s are sequences. — Paul Panzer, Nov 02 '18 at 18:37
@jdehesa, I think this question is primarily about the use of `range` in the indexing tuple. Ideally a multidimensional index is a tuple, but for backward compatibility some lists are interpreted as a tuples rather than an advanced indexing array. — hpaulj, Nov 02 '18 at 18:39
Other Py3 sequence producers like generators, dictionary `keys`, `items`, map, don't work in either `np.array` or indexing. — hpaulj, Nov 02 '18 at 18:41
You are just triggering fancy indexing. This is the same as writing `a[[0,1,2],[2,1,0]]` What is there to surprise? — anishtain4, Nov 02 '18 at 18:42
@hpaulj strictly speaking those don't produce *sequences* as understood in python. Sequences are collections that implement `__len__` and take `int` objects for their `__getitem__`, (among a couple other methods like `__reversed__` and `.index`) but essentially things you can do `x[1]` and `len(x)` on. See the abstract base class here: https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes — juanpa.arrivillaga, Nov 02 '18 at 18:53
We've had some past discussions of what it takes to be accepted as a sequence - that is, what methods a custom class has to implement to work in `np.array`. One could construct such a minimal class, and see if it works as an advanced index. — hpaulj, Nov 02 '18 at 19:31
In https://stackoverflow.com/a/48350240/901925, `Convertible` class works as an advanced index provided it's `__array__` uses `np.zeros(7, int)` (as opposed to the default floats). — hpaulj, Nov 02 '18 at 19:54
`A()` in https://stackoverflow.com/questions/30037104/make-class-convertable-to-ndarray also works as an index. — hpaulj, Nov 02 '18 at 20:14

score 2 · Accepted Answer · answered Nov 08 '18 at 23:33

Just to wrap this up (thanks to @WarrenWeckesser in the comments): This behavior is actually documented. One only has to realize that range objects are python sequences in the strict sense.

So this is just a case of fancy indexing. Be warned, though, that it is very slow:

>>> a = np.arange(100000)
>>> timeit(lambda: a[range(100000)], number=1000)
12.969507368048653
>>> timeit(lambda: a[list(range(100000))], number=1000)
7.990526253008284
>>> timeit(lambda: a[np.arange(100000)], number=1000)
0.22483703796751797

score 1 · Answer 2 · answered Nov 02 '18 at 17:52

Not a proper answer, but too long for comment.

In fact, it seems to work with about any indexable object:

import numpy as np

class MyIndex:
    def __init__(self, n):
        self.n = n
    def __getitem__(self, i):
        if i < 0 or i >= self.n:
            raise IndexError
        return i
    def __len__(self):
        return self.n

a = np.array([1, 2, 3])
print(a[MyIndex(2)])
# [1 2]

I think the relevant lines in NumPy's code are below this comment in core/src/multiarray/mapping.c:

/*
 * Some other type of short sequence - assume we should unpack it like a
 * tuple, and then decide whether that was actually necessary.
 */

But I'm not entirely sure. For some reason, this hangs if you remove the if i < 0 or i >= self.n: raise IndexError, even though there is a __len__, so at some point it seems to be iterating through the given object until IndexError is raised.

It iterating would be consistent with that it is actually quite slow, for example compared to indexing with `arange`s. — Paul Panzer, Nov 02 '18 at 18:14

Using python range objects to index into numpy arrays

2 Answers2

Linked