0

I'm confused on how to properly do k-fold cross validation because I've seen it done two ways:

  1. The first way is where you split the data set into k partitions, one for testing, one for validation, and the rest for training. Each partition of the data ends up being used for validation and testing exactly once.

  2. The second way is where you split the data set into two partitions, one for testing and one for training/validation. Then you partition the training/validation set into k partitions, one for training and one for validation. Each partition of the data in the training/validation set ends up being used for validation exactly once. The testing set remains the same for each cross validation iteration.

Which method here is correct and why? Or are they both valid?

Edit: The question you linked to as a duplicate does not answer the question. I'm asking about the validity of two potential cross validation methods.

The linked question is asking about the order of using the training, validation, and testing sets in various validation methods (holdout, something else, and the 2nd cross validation approach I described above).

I see that the second approach is valid now because that was mentioned and answered. But what about the first method I described?

  • This is not programming-related and is probably more on-topic at [Cross Validated](https://stats.stackexchange.com/), although there are plenty of questions on cross-validation there (as the name might imply), so check if your question has been answered there first. – Mihai Chelaru Jul 27 '19 at 15:55
  • Oh, thanks. I searched on google for my question but didn't find an answer. desertnaut marked it as a duplicate, but it wasn't fully answered (see edit). – Jeffrey McCullen Jul 27 '19 at 16:41
  • 1
    I guessed that, given the detailed clarifications in the linked thread, it would be apparent that the 1st approach you describe is invalid – desertnaut Jul 27 '19 at 18:34
  • Thanks. I wasn't sure because when I was taught how cross validation works, the first approach was my understanding. But then I read sklearns docs and they described the second way. So I was confused. – Jeffrey McCullen Jul 27 '19 at 18:47

0 Answers0