With caret
package, when creating data partition 75% training and 25% test, we use:
inTrain<- createDataPartition(y=spam$type,p=0.75, list=FALSE)
Note: dataset is named spam
and target variable is named type
My question is, what is the purpose of including y=spam$type
argument?
Isn’t the purpose of creating data partitions simply to split the entire data set based on the proportion you require for training vs testing? Why is there the need to include that argument in the code?