Consider the following matrix:
X = np.arange(9).reshape(3,3)
array([[0, 1, 2],
[3, 4, 5],
[6, 7, 8]])
Let say I want to subset the following array
array([[0, 4, 2],
[3, 7, 5]])
It is possible with some indexing of rows and columns, for instance
col=[0,1,2]
row = [[0,1],[1,2],[0,1]]
Then if I store the result in a variable array I can do it with the following code:
array=np.zeros([2,3],dtype='int64')
for i in range(3):
array[:,i]=X[row[i],col[i]]
Is there a way to broadcast this kind of operation ? I have to do this as a data cleaning stage for a large file ~ 5 Gb, and I would like to use dask to parallelize it. But in a first time if I could avoid using a for loop I would feel great.