I'm trying to train a classifier for the classes "Hit", "Miss" based on the variables User
, Planning Horizon
, Material
, and some more.
Most of them are categorical variables except for Planning Horizon
(integer)
I have unbalanced data so im trying to use thresholding to select the final output of the model (Rather than just using the default 0.5 probability)
The variable User
has the most impact on the class outcome, so im trying to use different thresholds for every user. Im thinking about using the naive bayes posterior probability P(Class|User).
The question is, how can i apply those different rules for the output matrix of the model:
The "Thresholds matrix", a different threshold for every user:
User P("Hit"|User)
A 0.80
B 0.40
C 0.61
And the outputs of the classifier (P(Hit) and P(Miss)) and the last column (Final Prediction) is what i need to construct.
User P("Miss") P("Hit") Final Prediction
B 0.79 0.21 Miss
B 0.20 0.80 Hit
A 0.15 0.85 Hit
C 0.22 0.78 Hit
A 0.90 0.10 Miss
B 0.80 0.20 Miss
Notice the first row gets a MISS because P(Miss) is lower than P(Hit|User=B)