Issues when using randomForest in caret with ROC as optimization metric

Question

I'm having an issue when constructing random forest models using caret. I have a dataset of about 46k rows and 10 columns (one of which is the optimization target). From this dataset, I'm trying to compare different classifiers. I did the following:

ctrl = trainControl(method="boot"
  ,classProbs=TRUE
  ,summaryFunction=twoClassSummary )

#GLM Model:
model.glm = train(x=d[,2:10]
  ,y=d$CONV_BT, method='glm'
  ,trControl=ctrl, metric="ROC"
  ,family="binomial")

#Random Forest Model:
model.rf = train(x=d[,2:10]
  ,y=d$CONV_BT, method='rf'
  ,trControl=ctrl, metric="ROC")

#Naive Bayes Model:
model.nb = train(x=d[,2:10]
  ,y=d$CONV_BT, method='nb'
  ,trControl=ctrl, metric="ROC" )

Then, model.glm and model.nb both look pretty decent. I can look at the 25 bootstrap replications, and each case has an ROC of around .7. However, something appears to be wrong with model.rf, because the reported ROC scores are all around .3. That suggests to me that something is being specified incorrectly, because I could just switch my predictions from the rf model from p to 1-p and my ROC would then be .7, right?

I'm sorry that I can't provide the data (because it's pretty big to upload and it's proprietary). The other bizarre thing is that when I simulate data, I no longer have this issue. Any idea what this could be??? Thanks for your help!

What class has dependent variable d$CONV_BT - factor or not? — DrDom, May 21 '13 at 18:54
I'm sorry, I forgot to note that... d$CONV_BT is a factor with two levels: "Y" and "N" (I was originally getting an error when the levels were "0"/"1" so I changed them). — random_forest_fanatic, May 22 '13 at 11:30

Issues when using randomForest in caret with ROC as optimization metric

0 Answers0