0

I'm wondering if I can get some assistances on how I can employ K-fold cross validation to data that needs to be split into groups. I know how to employ the K-fold cross validation for standard data where each row is an independent event, however in the example of horse racing context what do I need to do to my code to modify it to suit grouped data to avoid mixing horses from one race to another and even mixing between test and training samples? Each race/independent event in my data has a unique identifier denoted as 'RaceNo#' so I know I have to do something with that. For example Group 1 would need to be all horses in Race 1 with only one winner, group 2 race 2 with only one winner etc.

Below is also the standard K-fold cross validation code I have. I am hoping its an easier manipulation of this code to make it suitable?

HorseData<- read.table(file="C:\\Professional\\HorseData.txt", header=TRUE, sep="\t")
HorseData[is.na(HorseData)]<-0
HorseData<- as.data.frame(HorseData)
library(caret)
library(randomForest)
set.seed(2000)
IndexMatrix <- createDataPartition(HorseData$Winner, p=0.8, list=FALSE, times=1)
HorseData<- as.data.frame(HorseData)
TrainData <- HorseData[IndexMatrix,]
TestData <- HorseData[-IndexMatrix,]
TrainData$Winner[TrainData$Winner==1] <- "Win"
TrainData$Winner[TrainData$Winner==0] <- "Lose"
TestData$Winner[TestData$Winner==1] <- "Win"
TestData$Winner[TestData$Winner==0] <- "Lose"
TrainData$Winner <- as.factor(TrainData$Winner)
TestData$Winner <- as.factor(TestData$Winner)
cntrlspecs <- trainControl(method="cv", number=10, savePredictions="all", classProbs=TRUE)
set.seed(2000)
LogitModel <- train(Winner~"VARIABLES", data=TrainData, method="glm", family=binomial)
print(LogitModel)
summary(LogitModel)
varImp(LogitModel)
Prediction <- predict(LogitModel, newdata=TestData)
confusionMatrix(data=Prediction, TestData$Winner)

Any help greatly appreciated.

thanks,

T.

Phil
  • 7,287
  • 3
  • 36
  • 66
  • check [this](https://mlr3gallery.mlr-org.com/posts/2020-03-30-stratification-blocking/#block-resampling) out. – missuse Sep 24 '21 at 03:05
  • as well as this: https://stackoverflow.com/questions/48142617/caret-combine-createresample-and-groupkfold, https://stackoverflow.com/questions/48212334/caret-combine-the-stratified-createmultifolds-repeatedcv-and-groupkfold – missuse Sep 25 '21 at 07:30

0 Answers0