1

I am working on a large array (3000 x 3000) over which I use scipy.ndimage.label. The return is 3403 labels and the labelled array. I would like to know the indices of these labels for e.g. for label 1 I should know the rows and columns in the labelled array. So basically like this

a[0] = array([[1, 1, 0, 0],
              [1, 1, 0, 2],
              [0, 0, 0, 2],
              [3, 3, 0, 0]])


indices = [np.where(a[0]==t+1) for t in range(a[1])] #where a[1] = 3  is number of labels. 

print indices
[(array([0, 0, 1, 1]), array([0, 1, 0, 1])), (array([1, 2]), array([3, 3])), (array([3, 3]), array([0, 1]))]

And I would like to create a list of indices for all 3403 labels like above. The above method seems to be slow. I tried using generators, it doesn't look like there is improvement.

Are there any efficient ways?

Gargantua89
  • 13
  • 1
  • 3

1 Answers1

0

Well the idea with gaining efficiency would be to minimize the work once inside the loop. A vectorized method isn't possible given that you would have variable number of elements per label. So, with those factors in mind, here's one solution -

a_flattened = a[0].ravel()
sidx = np.argsort(a_flattened)
afs = a_flattened[sidx]
cut_idx = np.r_[0,np.flatnonzero(afs[1:] != afs[:-1])+1,a_flattened.size]
row, col = np.unravel_index(sidx, a[0].shape)
row_indices = [row[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]
col_indices = [col[i:j] for i,j in zip(cut_idx[:-1],cut_idx[1:])]

Sample input, output -

In [59]: a[0]
Out[59]: 
array([[1, 1, 0, 0],
       [1, 1, 0, 2],
       [0, 0, 0, 2],
       [3, 3, 0, 0]])

In [60]: a[1]
Out[60]: 3

In [62]: row_indices # row indices
Out[62]: 
[array([0, 0, 1, 2, 2, 2, 3, 3]), # for label-0
 array([0, 0, 1, 1]),             # for label-1
 array([1, 2]),                   # for label-2    
 array([3, 3])]                   # for label-3

In [63]: col_indices  # column indices
Out[63]: 
[array([2, 3, 2, 0, 1, 2, 2, 3]), # for label-0
 array([0, 1, 0, 1]),             # for label-1
 array([3, 3]),                   # for label-2
 array([0, 1])]                   # for label-3

The first elements off row_indices and col_indices are the expected output. The first groups from each those represent the 0-th regions, so you might want to skip those.

Divakar
  • 218,885
  • 19
  • 262
  • 358