1

I'm having a bit of trouble understanding the output of unravel_index in the context of the following bit of code.

Using meshgrid I create two arrays representative of some coordinates:

import numpy as np

x_in=np.arange(-800, 0, 70)
y_in=np.arange(-3500, -2000, 70)
y, x =np.meshgrid(y_in,x_in,indexing='ij')

I then run through one of the grids to identify values within certain limits:

limit=100
x_gd=x[np.logical_and(x>=-600-limit,x<=-600+limit)]

This returns an array with the values I'm interested in - to get the indices of these values I use the following function (which I developed after reading this):

def get_index(array, select_array):
    ''' 
    Find the index positions of values from select_array in array
    ''' 
    rows,cols=array.shape
    flt = array.flatten()
    sorted = np.argsort(flt)
    pos = np.searchsorted(flt[sorted], select_array)
    indices = sorted[pos] 
    y_indx, x_indx = np.unravel_index(indices, [rows, cols])

    return y_indx, x_indx

    xx_y_indx, xx_x_indx = get_index(x, x_gd)

xx_x_indx returns what I expect - the col reference for the values from x:

array([2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3,
   4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2,
   3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4, 2, 3, 4], dtype=int64)

xx_y_indx however returns:

array([15,  2, 19, 15,  2, 19, 15,  2, 19, 15,  2, 19, 15,  2, 19, 15,  2,
   19, 15,  2, 19, 15,  2, 19, 15,  2, 19, 15,  2, 19, 15,  2, 19, 15,
    2, 19, 15,  2, 19, 15,  2, 19, 15,  2, 19, 15,  2, 19, 15,  2, 19,
   15,  2, 19, 15,  2, 19, 15,  2, 19, 15,  2, 19, 15,  2, 19], dtype=int64)

when I would expect it to show all rows as the coordinates represented by array x are identical every line - not just in rows 15, 2 and 19.

For what I'm interested in, I can just use the result of xx_x_indx - the column indices. However, I can't explain why the y (row) indices report as they do.

Community
  • 1
  • 1
ChrisWills
  • 131
  • 1
  • 10

1 Answers1

1

This call to searchsorted is not finding the location of every occurrance of selected_array in flt[sorted]; it is finding the index of the first occurrance.

pos = np.searchsorted(flt[sorted], select_array)

In [273]: pos
Out[273]: 
array([44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66,
       88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44,
       66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88,
       44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88, 44, 66, 88])

Notice all the repeated values in pos.


Everything past this point is perhaps not what you intended, since you are not really working with all the locations of the select_array values in flt[sorted] or array.


You could fix the problem by using:

def get_index(array, select_array):
    ''' 
    Find the index positions of values from select_array in array
    '''
    mask = np.logical_or.reduce([array==val for val in np.unique(select_array)])
    y_indx, x_indx = np.where(mask)
    return y_indx, x_indx

or

def get_index2(array, select_array):
    idx = np.in1d(array.ravel(), select_array.ravel())
    y_indx, x_indx = np.where(idx.reshape(array.shape))
    return y_indx, x_indx

Which is faster depends on the number of elements in np.unique(select_array). When this is large, using a for-loop is slower, and hence get_index2 is faster. But if there are a lot of repeats in select_array and np.unique(select_array) is small, then get_index can be the faster option.


To demonstrate a use of np.unravel_index, you could even use

def get_index3(array, select_array):
    idx = np.in1d(array.ravel(), select_array.ravel())
    y_indx, x_indx = np.unravel_index(np.where(idx), array.shape)
    return y_indx, x_indx

but I think this is slower than get_index2 in all cases since reshape is very fast so using np.where with reshape is faster than using np.where and np.unravel_index.

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677