I am working with large matrices of class IDs and predicted probabilities which I want to average. I then want to return the 3 classes in each row with the highest probabilities.
The problem is, the classes in each row vary. What is the most efficient way to implement this?
Here is a toy example using just one row:
a = [11, 12, 13]
a_probs = [0.2, 0.1, 0.02]
b = [8, 11, 15]
b_probs = [0.05, 0.4, 0.12]
So, in this example, only class 11 occurs in both matrices. So, the averaged probability for each class is:
[8, 11, 12, 13, 15] (0.05+0)/2 (0.2+0.4)/2 0.1+0/2 0.02+0/2 0.12+0/2
My current method is very slow: concatenate the classes for one row across all matrices, unique, locate and sum the probs for each class, average.