1

I have a data set and would like caret to train and validate on a specific part of my data set only. I have two lists

train.ids <- list(T1=c(1,2,3), T2=c(4,5,6), T3=c(7,8,9))

and

test.ids <- list(T1=c(10,11,12), T2=c(13,14,15), T3=(16,17,18))

which correspond to the row indices in my data set. train.ids$T1 should be used for training, while test.ids$T1 should be used for testing. Same goes for T2 and T3.

I tried using

trainControl(method="cv", index=train.ids, indexOut=test.ids)

but this doesn't seem to be the correct way of using trainControl.

Any help is highly appreciated

smci
  • 32,567
  • 20
  • 113
  • 146
  • As they correspond to row indices of your data set, you need to do `df[tain.ids,]` where `df` is your data set in a `data.frame` – infominer Apr 28 '14 at 22:51
  • 1
    Thanks for your input! Since `test.ids` is a named list, I can't index the data frame using `test.ids`. According to the caret documentation, index and indexOut are lists of row indices, and this is what `createDataPartition(df)` or `createTimeSlices` returns as well. –  Apr 28 '14 at 23:04
  • I should have added to use `train.ids$T1` and `test.ids$T1` like you mentioned in your question. – infominer Apr 29 '14 at 14:42
  • Then I'd only train and test on T1, but not on T2 and T3. Sorry, my question might have been ambiguous; I want to use all train and test pairs. –  Apr 29 '14 at 18:41

1 Answers1

3

Was there an error generated? I'm not sure why this wouldn't work. Here is an example:

library(caret)

## A small data set example
set.seed(2)
dat <- twoClassSim(9)[, 13: 16]

fit_on <-  list(rs1 = 1:3, rs2 = 4:6,         rs3 = 7:9)
pred_on <- list(rs1 = 4:9, rs2 = c(1:3, 7:9), rs3 = 1:6)

ctrl <- trainControl(method = "cv", 
                     ## The method doesn't really matter
                     ## since we are defining the resamples
                     index= fit_on, indexOut = pred_on,
                     verboseIter = TRUE,
                     savePredictions = TRUE)

mod <- train(Class ~ ., data = dat, method = "lda",
             trControl = ctrl)

Take a look at mod$pred and you can see what was predicted at each iteration.

Max

topepo
  • 13,534
  • 3
  • 39
  • 52