5

Consider the following matrix:

X = np.arange(9).reshape(3,3)
     array([[0, 1, 2],
            [3, 4, 5],
            [6, 7, 8]]) 

Let say I want to subset the following array

array([[0, 4, 2],
       [3, 7, 5]])

It is possible with some indexing of rows and columns, for instance

col=[0,1,2] 
row = [[0,1],[1,2],[0,1]]

Then if I store the result in a variable array I can do it with the following code:

array=np.zeros([2,3],dtype='int64')
for i in range(3):
    array[:,i]=X[row[i],col[i]]

Is there a way to broadcast this kind of operation ? I have to do this as a data cleaning stage for a large file ~ 5 Gb, and I would like to use dask to parallelize it. But in a first time if I could avoid using a for loop I would feel great.

Cœur
  • 37,241
  • 25
  • 195
  • 267
jmamath
  • 190
  • 2
  • 13

1 Answers1

4

For arrays with NumPy's advanced-indexing, it would be -

X[row, np.asarray(col)[:,None]].T

Sample run -

In [9]: X
Out[9]: 
array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [10]: col=[0,1,2] 
    ...: row = [[0,1],[1,2],[0,1]]

In [11]: X[row, np.asarray(col)[:,None]].T
Out[11]: 
array([[0, 4, 2],
       [3, 7, 5]])
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • Thank you, this is exactly what I was looking for. – jmamath Apr 01 '18 at 17:18
  • Is it possible to do the same operation using numpy.take ? I am asking this because dask arrays don't support this kind of fancy indexing but implement a function similar to numpy.take. – jmamath Apr 16 '18 at 18:06
  • @jmamath You can do : `np.take(X, row*X.shape[1] + col[:,None])` for `row` and `col` as arrays. – Divakar Apr 16 '18 at 18:15