0

I have a tensorflow array names tf-array and a numpy array names np_array. I want to find specific rows in tf_array with regards to np-array.

    tf-array = tf.constant(
                [[9.968594,  8.655439,  0.,        0.       ],
                 [0.,        8.3356,    0.,        8.8974   ],
                 [0.,        0.,        6.103182,  7.330564 ],
                 [6.609862,  0.,        3.0614321, 0.       ],
                 [9.497023,  0.,        3.8914037, 0.       ],
                 [0.,        8.457685,  8.602337,  0.       ],
                 [0.,        0.,        5.826657,  8.283971 ]])

I also have an np-array:

np_array = np.matrix(
 [[2, 5, 1],
  [1, 6, 4],
  [0, 0, 0],
  [2, 3, 6],
  [4, 2, 4]]

Now I want to keep the elements in tf-array in which the combination of n (here n is 2) of them (index of them) is in the value of np-array. What does it mean?

For example, in tf-array, in the first column, indexes which has value are: (0,3,4). Is there any row in np-array which contains any combination of these two indexes: (0,3), (0,4) or (3,4). Actually, there is no such row. So all the elements in that column became zero.

Indexes for the second column in tf-array is (0,1) (0,5) (1,5). As you see the record (1,5) is available in the np-array in the first row. Thats why we keep those in the tf-array.

So the final result should be like this:

[[0.        0.        0.        0.       ]
 [0.        8.3356    0.        8.8974   ]
 [0.        0.        6.103182  7.330564 ]
 [0.        0.        3.0614321 0.       ]
 [0.        0.        3.8914037 0.       ]
 [0.        8.457685  8.602337  0.       ]
 [0.        0.        5.826657  8.283971 ]]

I am looking for a very efficient approach as I have large number of data.

Update1

I could get this with the below code which is giving True where there is value and the zero mask to false:

[[ True  True False False]
 [False  True False  True]
 [False False  True  True]
 [ True False  True False]
 [ True False  True False]
 [False  True  True False]
 [False False  True  True]]

with tf.Session() as sess:  
 where = tf.not_equal(tf-array, 0.0)
 print(sess.run(where))

But how can I compare theese matrix with np_array?

Thank you in advance!

sariii
  • 2,020
  • 6
  • 29
  • 57

2 Answers2

1

The one eficient way you can try is to make bit flags for each row what value are there like for (0,3,4) will be 1 <<0 | 1<<3 | 1<<4. You will have array of values with flags.Try if << and | operator work in numpy. Make the same for another array, i guess tf- arrays are just wrapped numpys. After having 2 array of flags, make bitwise "and" over those. Where you condition is true for rows, the result will have at least two non zero bits. Also cound of bits can be done also efficient, google for that.

This hovever wont work with float - you ll need convert those to pretty small ints.

import numpy as np



arr_one =  np.array(
 [[2, 5, 1],
  [1, 6, 4],
  [0, 0, 0],
  [2, 3, 6],
  [4, 2, 4]])

arr_two =  np.array(
 [[2, 0, 7],
  [1, 3, 4],
  [5, 5, 6],
  [1, 3, 6],
  [4, 2, 4]])




print('1 << arr_one.T[0] ' , 1 << arr_one.T[0] )


arr_one_flags = 1 << arr_one.T[0] | 1 << arr_one.T[1] | 1 << arr_one.T[2]

print('arr_one_flags ', arr_one_flags)

arr_two_flags = 1 << arr_two.T[0] | 1 << arr_two.T[1] | 1 << arr_two.T[2]

arr_and = arr_one_flags & arr_two_flags

print('arr_and ', arr_and)



def get_bit_count(value):
    n = 0
    while value:
        n += 1
        value &= value-1
    return n

arr_matches = np.array([get_bit_count(x) for x in arr_and])


print('arr_matches ', arr_matches )


arr_two_filtered = arr_two[arr_matches > 1]

print('arr_two_filtered ', arr_two_filtered )
user8426627
  • 903
  • 1
  • 9
  • 19
  • thank you for following up with my questions, though your suggestion is kind of vague. can you give me a minimal working code of your idea please? also, unfortunately I can not change float to int – sariii Jun 10 '19 at 20:36
  • enjoy. change float like int(f*1000) whatever or you cant set flags and this will be mutch less efficient – user8426627 Jun 10 '19 at 22:31
  • Thank you so much for putting time, I can not change float to int as they are the weight of nn which can not get changed at all. can you give more clue about the less efficient approach "set flags and this will be mutch less efficient"? – sariii Jun 10 '19 at 22:37
  • Actually it is the matrix in the neural network in one of the layers which I want to change, that's why I can not change the weight values in the tf-array. – sariii Jun 10 '19 at 23:49
1

Here is the solution from https://stackoverflow.com/a/56510832/7207392 with necessary modifications. For the sake of simplicity I use np.array for all data. I'm no tensortflow expert, so if translating is not entirely straight forward, you'll have to ask somebody else how to do it.

import numpy as np

def f(a1, a2, n):
    N,M = a1.shape
    a1p = np.concatenate([a1,np.zeros((1,a1.shape[1]),a1.dtype)], axis=0)
    a2 = np.sort(a2, axis=1)
    a2[:,1:][a2[:,1:]==a2[:,:-1]] = N
    y,x = np.where(np.count_nonzero(a1p[a2], axis=1) >= n)
    out = np.zeros_like(a1p)
    out[a2[y],x[:,None]] = a1p[a2[y],x[:,None]]
    return out[:-1]

a1 = np.array(
    [[9.968594,  8.655439,  0.,        0.       ],
     [0.,        8.3356,    0.,        8.8974   ],
     [0.,        0.,        6.103182,  7.330564 ],
     [6.609862,  0.,        3.0614321, 0.       ],
     [9.497023,  0.,        3.8914037, 0.       ],
     [0.,        8.457685,  8.602337,  0.       ],
     [0.,        0.,        5.826657,  8.283971 ]])

a2 = np.array(
 [[2, 5, 1],
  [1, 6, 4],
  [0, 0, 0],
  [2, 3, 6],
  [4, 2, 4]])

print(f(a1,a2,2))

Output:

[[0.        0.        0.        0.       ]
 [0.        8.3356    0.        8.8974   ]
 [0.        0.        6.103182  7.330564 ]
 [0.        0.        3.0614321 0.       ]
 [0.        0.        3.8914037 0.       ]
 [0.        8.457685  8.602337  0.       ]
 [0.        0.        5.826657  8.283971 ]]
Paul Panzer
  • 51,835
  • 3
  • 54
  • 99