I have two numpy array 2D. What I want to do is to find specific rows of np_weight
in the np_sentence
.
For example:
#rows are features, columns are clusters or whatever
np_weight = np.random.uniform(1.0,10.0,size=(7,4))
print(np_weight)
[[9.96859395 8.65543961 6.07429382 4.58735497]
[3.21776471 8.33560037 2.11424961 8.89739975]
[9.74560314 5.94640798 6.10318198 7.33056421]
[6.60986206 2.36877835 3.06143215 7.82384351]
[9.49702267 9.98664568 3.89140374 5.42108704]
[1.93551346 8.45768507 8.60233715 8.09610975]
[5.21892795 4.18786508 5.82665674 8.28397111]]
#rows are sentence index, columns are words on that sentence
np_sentence = np.random.randint(0.0,7.0,size=(5,3))
print(np_sentence)
[[2 5 1]
[1 6 4]
[0 0 0]
[2 3 6]
[4 2 4]]
If I sort np_weight
on each column and then get top5 of that, I will have this one
(here I just show the first column):
temp_sorted_result=
[9.96859395 ] --->index=0
[9.74560314 ] --→ index=2
[9.49702267 ] --→ index=4
[6.60986206 ] --->index=3
[5.21892795 ] --->index=6
Now, I want to search these indexes two by two in the second numpy array np_sentence
to see is there any row on that which contains two of the indexes.
For example, based on this it has to output: 1,3,4
. These are the indices of the np_sentence
which includes a combination of two of the indexes in temp_sorted_result
.
for instance, both 4 and 6
which are available in temp_sorted_result
are in the same row of np_sentence
in the row=1
and so on.
I need to do this for each column of np_weight
. It is very important for me to have a very efficient code as the number of the rows are very large
What I have done so far is only searching one item in the second array which is not what I want ultimately:
One approach could be I form all the combinations for each column, for example for the first column showed above temp_sorted_result
, I form
(0,2) (0,4)(0,3) (0,6)
(2,4) (2,3) (2,6)
(4,3)(4,6)
(3,6)
and then check which one is available in the rows of np_sentence
. Base on my np_sentence
rows index of 1,3,4
contains some of these.
Now my question is that how can I implement this in a most efficient way?
Please let me know if it is not obvious.
I appreciate your help:)