3

I've seen it once or twice before, but I can't seem to find any official docs on it: Using python range objects as indices in numpy.

import numpy as np
a = np.arange(9).reshape(3,3)
a[range(3), range(2,-1,-1)]
# array([2, 4, 6])

Let's trigger an index error just to confirm that ranges are not in the official range (pun intended) of legal indexing methods:

a['x']

# Traceback (most recent call last):
#   File "<stdin>", line 1, in <module>
# IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

Now, a slight divergence between numpy and its docs is not entirely unheard of and does not necessarily indicate that a feature is not intended (see for example here).

So, does anybody know why this works at all? And if it is an intended feature what are the exact semantics / what is it good for? And are there any ND generalizations?

Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
  • I've never seen this; is it used in any reputable libraries? – roganjosh Nov 02 '18 at 17:34
  • 1
    numpy predates Python 3. In Python 2, `range(3)` is a list of integers, which numpy treats as "array-like". It would have been a mess if numpy didn't also handle that in a backwards compatible way in Python 3. – Warren Weckesser Nov 02 '18 at 18:13
  • *"So, does anybody know why this works at all?"* It is a nice feature, informally called "fancy" indexing, and in the docs it is called [advanced indexing](https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#advanced-indexing). – Warren Weckesser Nov 02 '18 at 18:19
  • 1
    @WarrenWeckesser That's right, it says there `(...) a non-tuple sequence object` (although a `non-tuple sequence (such as a list) containing slice objects` will trigger basic indexing, it seems). Not sure why it should hang if `IndexError` is not raised though, but whatever. I think you could make this an answer. – jdehesa Nov 02 '18 at 18:30
  • It could be that indexing tries `np.asarray(x)` with works with both `range(3)` and `[0,1,2]`. Other things produce errors or object dtype arrays. @WarrenWeckesser, makes a good point about compatibility with Py2's version of `range`. – hpaulj Nov 02 '18 at 18:36
  • @WarrenWeckesser Hm, good point. I always forget that `range`s are sequences. – Paul Panzer Nov 02 '18 at 18:37
  • @jdehesa, I think this question is primarily about the use of `range` in the indexing tuple. Ideally a multidimensional index is a tuple, but for backward compatibility some lists are interpreted as a tuples rather than an advanced indexing array. – hpaulj Nov 02 '18 at 18:39
  • Other Py3 sequence producers like generators, dictionary `keys`, `items`, map, don't work in either `np.array` or indexing. – hpaulj Nov 02 '18 at 18:41
  • You are just triggering fancy indexing. This is the same as writing `a[[0,1,2],[2,1,0]]` What is there to surprise? – anishtain4 Nov 02 '18 at 18:42
  • @hpaulj strictly speaking those don't produce *sequences* as understood in python. Sequences are collections that implement `__len__` and take `int` objects for their `__getitem__`, (among a couple other methods like `__reversed__` and `.index`) but essentially things you can do `x[1]` and `len(x)` on. See the abstract base class here: https://docs.python.org/3/library/collections.abc.html#collections-abstract-base-classes – juanpa.arrivillaga Nov 02 '18 at 18:53
  • We've had some past discussions of what it takes to be accepted as a sequence - that is, what methods a custom class has to implement to work in `np.array`. One could construct such a minimal class, and see if it works as an advanced index. – hpaulj Nov 02 '18 at 19:31
  • In https://stackoverflow.com/a/48350240/901925, `Convertible` class works as an advanced index provided it's `__array__` uses `np.zeros(7, int)` (as opposed to the default floats). – hpaulj Nov 02 '18 at 19:54
  • `A()` in https://stackoverflow.com/questions/30037104/make-class-convertable-to-ndarray also works as an index. – hpaulj Nov 02 '18 at 20:14

2 Answers2

2

Just to wrap this up (thanks to @WarrenWeckesser in the comments): This behavior is actually documented. One only has to realize that range objects are python sequences in the strict sense.

So this is just a case of fancy indexing. Be warned, though, that it is very slow:

>>> a = np.arange(100000)
>>> timeit(lambda: a[range(100000)], number=1000)
12.969507368048653
>>> timeit(lambda: a[list(range(100000))], number=1000)
7.990526253008284
>>> timeit(lambda: a[np.arange(100000)], number=1000)
0.22483703796751797
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99
1

Not a proper answer, but too long for comment.

In fact, it seems to work with about any indexable object:

import numpy as np

class MyIndex:
    def __init__(self, n):
        self.n = n
    def __getitem__(self, i):
        if i < 0 or i >= self.n:
            raise IndexError
        return i
    def __len__(self):
        return self.n

a = np.array([1, 2, 3])
print(a[MyIndex(2)])
# [1 2]

I think the relevant lines in NumPy's code are below this comment in core/src/multiarray/mapping.c:

/*
 * Some other type of short sequence - assume we should unpack it like a
 * tuple, and then decide whether that was actually necessary.
 */

But I'm not entirely sure. For some reason, this hangs if you remove the if i < 0 or i >= self.n: raise IndexError, even though there is a __len__, so at some point it seems to be iterating through the given object until IndexError is raised.

jdehesa
  • 58,456
  • 7
  • 77
  • 121
  • It iterating would be consistent with that it is actually quite slow, for example compared to indexing with `arange`s. – Paul Panzer Nov 02 '18 at 18:14