0

I am trying to get a AUC plot working using the AUC package in R. I am unsure of the error and new to this fit is the trained model: test is the test data

test$going_to_cross <- predict(fit, test, type="prob") 

prediction <- predict(fit, test, type="prob")
submit <- data.frame(cust_id = test$cust_id, already_crossed = test$flag_cross_over, predictions = prediction)
write.csv(submit, file = "../predictions /cross_sell_predictionsRF.csv", row.names = FALSE)

head(submit, 5)

print("predictions")
colnames(prediction) <- c("predictiona", "predictionb")
head(prediction)
which(submit$going_to_cross == 1)


print("names submit")
names(submit)

#predict_cross <- submit$going_to_cross.0
head(predict_cross, 5)

I get the output here as:

    cust_id already_crossed predictions.0   predictions.1
280 14080465    0           0.436   0.564
281 24047747    0           0.218   0.782 
282 10897483    0           0.606   0.394
283 14005276    0           0.448   0.552
284 18488402    0           0.284   0.716

[1] "predictions"

Out[317]:
    predictiona predictionb
280 0.436   0.564
281 0.218   0.782
282 0.606   0.394
283 0.448   0.552
284 0.284   0.716
285 0.104   0.896

The code from the package is:

auc(sensitivity(submit$predictions, submit$already_crossed))

And the warning message is:

Warning message: In is.na(x): is.na() applied to non-(list or vector) of type 'NULL'

Update:

# get the data into single vectors
 submit_pred <- matrix(submit$predictions.1)
 submit_cross <- matrix(submit$already_crossed)

 dt <- cbind(submit_pred, submit_cross)
  dt <- matrix(dt)


  names(dt) <- c("submit_pred", "submit_cross")

 roc_pred <- prediction(dt$submit_pred, dt$submit_cross)
 perf <- performance(roc_pred, "tpr", "fpr")
 plot(perf, col="red")
 abline(0,1,col="grey")

get area under the curve

performance(roc_pred,"auc")@y.values head(dt)

tony
  • 1,147
  • 2
  • 7
  • 10
  • what does `str(submit$predictions)` and `str(submit$already_crossed)` return? are there any `NA` values in `already_crossed`? – bjoseph Aug 12 '15 at 13:38
  • I think the "predictions" column includes information from predicting successes and failures (1s and 0s). Just try to use "submit$predictionb" instead of "submit$predictions" in your last piece of code. – AntoniosK Aug 12 '15 at 13:55
  • You are passing two vectors as prediction and you have one vector for actual values when you do : auc(sensitivity(submit$predictions, submit$already_crossed)) and you break the model. Also, you have only 0 predictions and the ROC curve will not be obtained. I'll send you an example soon.... – AntoniosK Aug 12 '15 at 15:11

1 Answers1

0

Try to adjust this script to your dataset (Using package ROCR).

library(ROCR)

# example dataset with some 0 and some 1 values as actual observations
dt = data.frame(matrix(data=c(
14080465 ,  0 ,  0.436 , 0.564,
24047747 ,  1 ,  0.218 , 0.782 ,
10897483 ,  0 ,  0.606 , 0.394,
14005276 ,  0 ,  0.448 , 0.552,
18488402 ,  1 ,  0.284 , 0.716
), nrow = 5, ncol = 4, byrow = T))

names(dt) = c("cust_id", "already_crossed", "predictions.0",   "predictions.1")

# obtain ROC curve
roc_pred <- prediction(dt$predictions.1, dt$already_crossed)
perf <- performance(roc_pred, "tpr", "fpr")
plot(perf, col="red")
abline(0,1,col="grey")

# get area under the curve
performance(roc_pred,"auc")@y.values

You can also do it with your approach (Using package AUC):

library(AUC)

# example dataset with some 0 and some 1 values as actual observations
dt = data.frame(matrix(data=c(
14080465 ,  0 ,  0.436 , 0.564,
24047747 ,  1 ,  0.218 , 0.782 ,
10897483 ,  0 ,  0.606 , 0.394,
14005276 ,  0 ,  0.448 , 0.552,
18488402 ,  1 ,  0.284 , 0.716
), nrow = 5, ncol = 4, byrow = T))

names(dt) = c("cust_id", "already_crossed", "predictions.0",   "predictions.1")

auc(sensitivity(dt$predictions.1, as.factor(dt$already_crossed)))
plot(sensitivity(dt$predictions.1, as.factor(dt$already_crossed)))

As I've said before you have to pass only one vector of predictions. Also, you need to save the actual classes (0s and 1s) as factors, otherwise the sensitivity function will break. However, I think what you want to compute (using your method) is this :

auc(roc(dt$predictions.1, as.factor(dt$already_crossed)))
plot(roc(dt$predictions.1, as.factor(dt$already_crossed)))
AntoniosK
  • 15,991
  • 2
  • 19
  • 32
  • Hi thanks for that good helpful explanation i am now getting error: Error in approxfun(x.values.2, y.values.2, method = "constant", f = 1, : zero non-NA points – tony Aug 12 '15 at 16:33
  • In which function is that? Make sure that you feed each function the type of variables/vectors it needs. So, some of them need factor variables (like the AUC package), but the other one needs numeric values. Check that and let me know. Or send me exactly the point where you get that error. – AntoniosK Aug 12 '15 at 16:40
  • Thanks, i have added an update part, and now i am getting the error dt$submit_pred: $ operator is invalid for atomic vectors --- on the roc_pred line? Thanks – tony Aug 12 '15 at 17:20
  • Why do you use submit_pred <- matrix(submit$predictions.1)? You already have your vector and it's the column with name predictions.1 . You just need to save your "submit" dataset as data.frame (if it's not) and then just put "submit" wherever I have "dt" in my script. – AntoniosK Aug 12 '15 at 17:26