8

I am getting an odd error

Error in `[.data.frame`(data, , lvls[1]) : undefined columns selected

message when I am using caret to train a glmnet model. I have used basically the same code and the same predictors for an ordinal model (just with a different factor ythen) and it worked fine. It took 400 core hours to compute so I cant show it here though).

#Source a small subset of data
source("https://gist.githubusercontent.com/FredrikKarlssonSpeech/ebd9fccf1de6789a3f529cafc496a90c/raw/efc130e41c7d01d972d1c69e59bf8f5f5fea58fa/voice.R")
trainIndex <- createDataPartition(notna$RC, p = .75, 
                                  list = FALSE, 
                                  times = 1)


training <- notna[ trainIndex[,1],] %>%
  select(RC,FCoM_envel:ATrPS_freq,`Jitter->F0_abs_dif`:RPDE)
testing  <- notna[-trainIndex[,1],] %>%
  select(RC,FCoM_envel:ATrPS_freq,`Jitter->F0_abs_dif`:RPDE)

fitControl <- trainControl(## 10-fold CV
  method = "CV",
  number = 10,
  allowParallel=TRUE,
  savePredictions="final",
  summaryFunction=twoClassSummary)

vtCVFit <- train(x=training[-1],y=training[,"RC"], 
                  method = "glmnet", 
                  trControl = fitControl,
                  preProcess=c("center", "scale"),
                  metric="Kappa"
)

I cant find anything obviously wrong with the data. No NAs

table(is.na(training))

FALSE 
43166

and dont see why it would try to index outside of the number of columns.

Any suggestions?

missuse
  • 19,056
  • 3
  • 25
  • 47
Fredrik Karlsson
  • 485
  • 8
  • 21
  • I have changed your tag `caret` to `r-caret`. Since the solution to your problem is rather straightforward I trust you could have obtained it much faster just if you used the correct tags. – missuse Sep 12 '18 at 08:10

2 Answers2

5

You have to remove summaryFunction=twoClassSummary in your trainControl(). It works for me.

fitControl <- trainControl(## 10-fold CV
 method = "CV",
 number = 10,
 allowParallel=TRUE,
 savePredictions="final")

vtCVFit <- train(x=training[-1],y=training[,"RC"], 
method = "glmnet", 
 trControl = fitControl,
preProcess=c("center", "scale"),
metric="Kappa")

 print(vtCVFit)

#glmnet 

#113 samples
#381 predictors
#  2 classes: 'NVT', 'VT' 

#Pre-processing: centered (381), scaled (381) 
#Resampling: Bootstrapped (25 reps) 
#Summary of sample sizes: 113, 113, 113, 113, 113, 113, ... 
#Resampling results across tuning parameters:

#  alpha  lambda      Accuracy   Kappa    
#  0.10   0.01113752  0.5778182  0.1428393
#  0.10   0.03521993  0.5778182  0.1428393
#  0.10   0.11137520  0.5778182  0.1428393
#  0.55   0.01113752  0.5778182  0.1428393
#  0.55   0.03521993  0.5748248  0.1407333
#  0.55   0.11137520  0.5749980  0.1136131
#  1.00   0.01113752  0.5815391  0.1531280
#  1.00   0.03521993  0.5800217  0.1361240
#  1.00   0.11137520  0.5939621  0.1158007

#Kappa was used to select the optimal model using the largest value.
#The final values used for the model were alpha = 1 and lambda = 0.01113752.
2

Change your factors to character by the following code and see if it works:

      training <- data.frame(lapply(training , as.character), stringsAsFactors=FALSE)

I would have left this suggestion as a comment but I wasn't able to do it (since I have less than 50 reputations!)

Shirin Yavari
  • 626
  • 4
  • 6
  • Not sure I understand your solution. I have only one factor (RC) and changing that to a character does not solve anything for me. I still get the same error. – Fredrik Karlsson Sep 04 '18 at 18:11