4

Let's create a large np array 'a' with 10,000 entries

import numpy as np
a = np.arange(0, 10000)

Let's slice the array with 'n' indices 0->9, 1->10, 2->11, etc.

n = 32
b = list(map(lambda x:np.arange(x, x+10), np.arange(0, n)))
c = a[b]

The weird thing that I am getting, is that if n is smaller than 32, I get an error "IndexError: too many indices for array". If n is bigger or equal than 32, then the code works perfectly. The error occurs regardless of the size of the initial array, or the size of the individual slices, but always with number 32. Note that if n == 1, the code works.

Any idea on what is causing this? Thank you.

  • What are you trying to do with your `map`? I will give [x...x+10) for a in [0...), i.e. [0,1,2,3,4,5,6,7,8,9] then [1,2,3,4,5,6,7,8,9, 10], then [2,3,4,5,6,7,8,9, 10, 11] ... which probably isn't what you meant. – doctorlove Feb 28 '19 at 17:00
  • Hi doctorlove, it really does not matter what I am trying to do with the map. I have changed the description of the code above. The real issue is with the error I get when n < 32. – Maurice Abou Jaoude Feb 28 '19 at 17:10

2 Answers2

2

First of all, you're not slicing 0->9, 10->19, 20->29; your slices advance by 1 only: 0->9, 1->10, 11->20. Instead, try this:

n = 32
size = 10
b = list(map(lambda x:np.arange(x, x+size), np.arange(0, n*size, size)))

Next, you've misused the indexing notation. b is a list of arrays, and you've used this entire list to index a. When you have indexed more elements than exist in a, numpy assumes that you want the complex list taken as a sequence of references, and uses them as individual index arrays, one a element per leaf element in b.

However, once you drop below the limit of len(a), then numpy assume that you're trying to give a multi-dimensional slice into a: each element of b is taken as a slice into the corresponding dimension of a. Since a is only 1-dimensional, you get the error message. Your code will run in this mode with n=1, but fails with n=2 and above.

Although your question isn't a duplicate, also please see this one.

Prune
  • 76,765
  • 14
  • 60
  • 81
  • Hi Prune thanks for your answer. You are right. The slicing 0->9, 10-19 was just something I chose randomly. The real problem is the weird error I get when n < 32. – Maurice Abou Jaoude Feb 28 '19 at 17:09
  • Do you understand the error message? I don't with n smaller, there are fewer indices, in `b`. What's the problem mean? – doctorlove Feb 28 '19 at 17:18
  • 1
    The code fails with `n` in the range 2-31 because `b` is then small enough to be interpreted as a multi-dimensional slice; that interpretation takes precedence. See the linked question for details. When `b` is 32 or larger, the only legal interpretation is as a sequence of individual requests. – Prune Feb 28 '19 at 17:22
  • Got yout point. But then why does it succeed with n = 32 and above? – Maurice Abou Jaoude Feb 28 '19 at 17:22
  • See the comment just above yours. – Prune Feb 28 '19 at 17:23
  • That's interesting. I would have not expected something like that to happen. I guess I will read more about this. Any idea how to make it work though for n in the range 2-31? – Maurice Abou Jaoude Feb 28 '19 at 17:24
2

Your b is a list of arrays:

In [84]: b = list(map(lambda x:np.arange(x, x+10), np.arange(0, 5)))            
In [85]: b                                                                      
Out[85]: 
[array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
 array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]),
 array([ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11]),
 array([ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12]),
 array([ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13])]

When used as an index:

In [86]: np.arange(1000)[b]                                                     
/usr/local/bin/ipython3:1: FutureWarning: Using a non-tuple sequence for multidimensional 
indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. 
In the future this will be interpreted as an array index, `arr[np.array(seq)]`, 
which will result either in an error or a different result.
  #!/usr/bin/python3
---------------------------------------------------------------
IndexError: too many indices for array

A[1,2,3] is the same as A[(1,2,3)] - that is, the comma separated indices are a tuple, which is then passed on to the indexing function. Or to put it another way, a multidimensional index should be a tuple (that includes ones with slices).

Up to now numpy has been a bit sloppy, and allowed us to use a list of indices in the same way. The warning tells us that the developers are in the process of tightening up those restrictions.

The error means it is trying to interpret each array in your list as the index for a separate dimension. An array can have at most 32 dimensions. Evidently for the longer list it doesn't try to treat it as a tuple, and instead creates a 2d array for indexing.

There are various ways we can use your b to index a 1d array:

In [87]: np.arange(1000)[np.hstack(b)]                                          
Out[87]: 
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9,  1,  2,  3,  4,  5,  6,  7,
        8,  9, 10,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,  3,  4,  5,  6,
        7,  8,  9, 10, 11, 12,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13])

In [89]: np.arange(1000)[np.array(b)]    # or np.vstack(b)                                       
Out[89]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13]])

In [90]: np.arange(1000)[b,]             # 1d tuple containing b                                       
Out[90]: 
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10],
       [ 2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
       [ 3,  4,  5,  6,  7,  8,  9, 10, 11, 12],
       [ 4,  5,  6,  7,  8,  9, 10, 11, 12, 13]])

Note that if b is a ragged list - one or more of the arrays is shorter, only the hstack version works.

hpaulj
  • 221,503
  • 14
  • 230
  • 353