1

I have a large sparse matrix whose each row contains multiple nonzero elements, for example

a = np.array([[1, 1,0,0,0,0], [2,0, 1,0,2,0], [3,0,4,0,0, 3]])

I want to be able to randomly select one nonzero element per row without for loop. Any good suggestion? As output, I am more interested in chosen elements' index than its value.

amina mollaysa
  • 387
  • 2
  • 4
  • 18

1 Answers1

0

With a numpy array such as:

arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])

you can do arr != 0 which will give a True / False array of values which pass the condition so in our case, where the values are not equal (!=) to 0. So:

array([ True,  True,  True, False,  True, False, False,  True], dtype=bool)

from here, we can 'index' arr with this boolean array by doing arr[arr != 0] which gives us:

array([5, 2, 6, 2, 6])

So now that we have a way of removing the non-zero values from a numpy array, we can do a simple list comprehension on each row in your a array. For each row, we remove the zeros and then perform a random.choice on the array. As so:

np.array([np.random.choice(r[r!=0]) for r in a])

which gives you back an array of length 3 containing random non-zero items from each row in a. :)

Hope this helps!

Update

If you want the indexes of the random non-zero numbers in the array, you can use .nonzero().

So if we have this array:

arr = np.array([5, 2, 6, 0, 2, 0, 0, 6])

we can do:

arr.nonzero()

which gives a tuple of the indexes of non-zero elements:

(array([0, 1, 2, 4, 7]),)

so as with before, we can use this and np.random.choice() in a list-comprehension to produce random indexes:

a = np.array([[1, 1, 0, 0, 0, 0], [2, 0, 1, 0, 2, 0], [3, 0, 4, 0, 0, 3]])

np.array([np.random.choice(r.nonzero()[0]) for r in a])

which returns an array of the form [x, y, z] where x, y and z are random indexes of non-zero elements from their corresponding rows.

E.g. one result could be:

array([1, 4, 2])

And if you want it to also return the rows, you could just add in a numpy.arrange() call on the length of a to get an array of row numbers:

([np.arange(len(a))], np.array([np.random.choice(r.nonzero()[0]) for r in a]))

so an example random output could be:

([array([0, 1, 2])], array([1, 2, 5]))

for a as:

array([[1, 1, 0, 0, 0, 0],
       [2, 0, 1, 0, 2, 0],
       [3, 0, 4, 0, 0, 3]])

Hope this does what you want now :)

Joe Iddon
  • 20,101
  • 7
  • 33
  • 54
  • Thank you, Joe, it is very helpful. I am more interested in the index of the none zero elements, so the output of interest would be the indices of non-zero elements (one non-zero element per row) – amina mollaysa Oct 03 '17 at 21:05
  • @aminamollaysa as you have a `2 dimensional` `array` here, how would you like the `indicies`? As `tuples`? The values themselves? Oh and I know yo'ur new to the site, but if this answer was helpful, you can show by up voting (the little arrow) **:)** – Joe Iddon Oct 03 '17 at 21:09
  • yes, if we chose randomly one element per row, I want the output is the indices of that chosen element, for example: the output can be [c, d] where c = [0, 1, 2 ] ( row index, which in fact is obvious) d = [0, 4, 2] ( column index), simply, I just need the column index, since we chose one element for each row. – amina mollaysa Oct 03 '17 at 21:33
  • @aminamollaysa I have updated the answer now using `.nonzero()`. It should do what you want now – Joe Iddon Oct 04 '17 at 09:33
  • Joe, Thank you very much, it is super clear and very helpful. Thank you for your time. – amina mollaysa Oct 04 '17 at 11:07