Data split in train, validation and test in subject independent 10-fold cross validation?

Question

I am working on emotion analysis. Recent papers in this area perform subject independent k-fold cross validation. But I have not seen any paper which uses validation set. They only mention train set and test set. For example, in 10 cross validation, whole dataset is divided into 10 subject independent sets (sub1 will come only in one set not in another). If we divide dataset only in train and test then how the hyper-parameters will be tuned. What will be the final accuracy as my val accuracy is varying (1%-4%) while train accuracy is reached upto 99.99%.

score 0 · Answer 1 · answered Dec 18 '18 at 02:58

0

Cross validation is a process of creating validation sets and training against it. You can tune hyperparameters by monitoring validation metrics during cross validation. If your validation accuracy is between 1-4% while your training accuracy is close to perfect, then your model is overfitting (a lot.) There are lots of ways to combat overfitting, but many of them are model-specific, so I'd need more information to be able to help further.

answered Dec 18 '18 at 02:58

Brandon Schabell

1,735
2
10
21

No. I was saying the difference between my validation accuracy in each epoch is almost 1%-4%. For example, epoch 79 `val_acc = 85%` and on epoch 80 `val_acc = 89%`. You said " You can tune hyperparameters by monitoring validation metrics during cross validation" but I don't have test set. Isn't it wrong to tune your parameter using val set when you are not using any test set. – manv Dec 18 '18 at 03:26
1

You'll need to create a test set that you leave out of your cross validation. The validation set is created automatically during cross validation, so if you separate your data into 2 parts at the beginning, one of them will be your testing set. – Brandon Schabell Dec 18 '18 at 03:30
So you are saying, if I have 100 subjects. Then 90 training 10 testing. and 90 sets will be further used for cross validation. If 10 cross validations then each set in cross validation will have 9 subjects. 9*9 training and 9 validation. Am I understanding right? – manv Dec 18 '18 at 03:36
That's essentially correct. A common example is having a 33/33/33 split. In this example, you'd have 33 test subjects completely left out. 67 subjects would be sent to the cross validation where 10 folds would be created (say with a 50/50 split). In this case, each fold would have 33 training subjects and 33 validation subjects. The metrics on these validation subjects is what is often used for hyperparameter tuning. – Brandon Schabell Dec 18 '18 at 03:39
Thanks for clear picture. I am not sure if I am correct. But the idea of subject independent 10 cross validation is to treat each subject in testing once. But as per your suggestion my test set is fixed. – manv Dec 18 '18 at 03:44
I think the confusion is in the naming convention between "testing" and "validation" sets. They are sometimes used interchangeably, but the testing set is typically the one that is left out while the validation set is the one that is used to tune a model. You would be validating on each subject (at least) once. – Brandon Schabell Dec 18 '18 at 03:47
So the final accuracy will be the average of all 10 accuracy got from predicting one test set (fixed) on all 10 trained models (in 10 cross validation)? Actually I am not aware how people are doing 10 cross validation in subject independent manner that is why I am asking these question. – manv Dec 18 '18 at 04:31

Data split in train, validation and test in subject independent 10-fold cross validation?

1 Answers1