0

I want to get the AUC on the testing set from cv.glmnet for the best set of hyperparameters. according to this post.

I should run cvm and get it, however, when I do this i get a value greater than 1, and my understanding is that the AUC should be between 0 and 1. Here's an example:

age     <- c(4, 8, 7, 12, 6, 9, 10, 14, 7) 
gender  <- as.factor(c(1, 0, 1, 1, 1, 0, 1, 0, 0))
bmi_p   <- c(0.86, 0.45, 0.99, 0.84, 0.85, 0.67, 0.91, 0.29, 0.88) 
m_edu   <- as.factor(c(0, 1, 1, 2, 2, 3, 2, 0, 1))
p_edu   <- as.factor(c(0, 2, 2, 2, 2, 3, 2, 0, 0))
f_color <- as.factor(c("blue", "blue", "yellow", "red", "red", "yellow", 
                       "yellow", "red", "yellow"))
asthma <- c(1, 1, 0, 1, 0, 0, 0, 1, 1)
xfactors <- model.matrix(asthma ~ gender + m_edu + p_edu + f_color)[, -1]
x        <- as.matrix(data.frame(age, bmi_p, xfactors))

cv.glmmod <- cv.glmnet(x, y=asthma, alpha=1,family="binomial", type.measure = "auc")

max(cv.glmmod$cvm)
[1] 7.0223

How do I interpret this number? is it really just .70223?

Thanks, Steve

StupidWolf
  • 45,075
  • 17
  • 40
  • 72
pd441
  • 2,644
  • 9
  • 30
  • 41
  • 1
    Your dataset only has 9 observations! You can't do CV with so few data points since each fold would have even less data. Consider 3-fold CV, you would be left with 3 data points per fold... Try adding more data (at least 100) and see if the problem still persists. – acylam Oct 09 '17 at 17:34
  • 2
    perhaps type `warnings()` when the function outputs: `There were 13 warnings (use warnings() to see them)` 12th of the warnings is: `Too few (< 10) observations per fold for type.measure='auc' in cv.lognet; changed to type.measure='deviance'. Alternatively, use smaller value for nfolds`. Smaller value for nfolds will not help - check @useR's comment. – missuse Oct 09 '17 at 18:04
  • Hi, thanks for the feedback. I'm fully aware that the data set is ridiculous small, it's just for illustration. However, you believe that with a larger data set the auc will then be within the normal range of between 0 and 1? Thanks! – pd441 Oct 10 '17 at 06:04

1 Answers1

2

For your dataset, cv.glmnet() do not measure the loss by "AUC", but "deviance", which is what you obtained by cv.glmmod$cvm.

Althouth you run the CV by cv.glmnet(type.measure="auc"), your dataset is too small. In this situation, cv.glmnet() (actually cv.lognet()) issues warning "Too few (< 10) observations per fold for type.measure='auc' in cv.lognet; changed to type.measure='deviance'. Alternatively, use smaller value for nfolds", and according to what the function complains about, it sets type.measure="deviance".

You can verify this by showing cv.glmmod$name, which should be "Partial Likelihood Deviance" in your case, instead of "AUC".

Alex Huang
  • 21
  • 3