Find column number that satisfies condition based on another vector

Question

I'm trying to train a classifier for the classes "Hit", "Miss" based on the variables User, Planning Horizon, Material, and some more. Most of them are categorical variables except for Planning Horizon (integer)

I have unbalanced data so im trying to use thresholding to select the final output of the model (Rather than just using the default 0.5 probability)

The variable User has the most impact on the class outcome, so im trying to use different thresholds for every user. Im thinking about using the naive bayes posterior probability P(Class|User).

The question is, how can i apply those different rules for the output matrix of the model:

The "Thresholds matrix", a different threshold for every user:

User    P("Hit"|User)
A           0.80
B           0.40
C           0.61

And the outputs of the classifier (P(Hit) and P(Miss)) and the last column (Final Prediction) is what i need to construct.

User    P("Miss")   P("Hit")    Final Prediction
B           0.79    0.21        Miss
B           0.20    0.80        Hit
A           0.15    0.85        Hit
C           0.22    0.78        Hit
A           0.90    0.10        Miss
B           0.80    0.20        Miss

Notice the first row gets a MISS because P(Miss) is lower than P(Hit|User=B)

What about merging your Threshold Matrix to the results create the Final Prediction again by comparing the `P("Hit")` with your `P("Hit"|User)`? — drmariod, Feb 15 '18 at 14:31
**1.** provide data to work with e.g. `dput(head(yourData,20))` **2.** Show and mark your desired output. — Andre Elrico, Feb 15 '18 at 14:37

score 0 · Answer 1 · answered Feb 15 '18 at 15:09

I would merge my threshold matrix and then create the Final Prediction column by hand like this.

df <- read.table(text='User P("Miss") P("Hit") "Final Prediction"
B 0.79 0.21 Miss
B 0.20 0.80 Hit
A 0.15 0.85 Hit
C 0.22 0.78 Hit
A 0.90 0.10 Miss
B 0.80 0.20 Miss', 
                 header=TRUE, sep=' ', check.names=FALSE)

thm <- read.table(text='User P("Hit"|User)
A 0.80
B 0.40
C 0.61', 
                  header=TRUE, sep=' ', check.names=FALSE)

thmdf <- merge(thm, df)

thmdf['My Final Prediction'] <- 
  ifelse(thmdf$`P(Hit)` < thmdf$`P(Hit|User)`, 
         'Miss',
         'Hit')

thmdf

Find column number that satisfies condition based on another vector

1 Answers1