1

Once I iterated on each training combination, given the k-fold split, I can estimate mean and standard deviation of models performance but I actually get k different models (with their own fitted parameters). How do I get the final, whole model? Is a matter of averaging the parameters?

Not showing code because is a general question so I'll write down the logic only:

  1. dataset
  2. splitting dataset according to the k-fold theory (let's say k = 5)
  3. iterations: training from the first to the fifth model
  4. getting 5 different models with, let's say, the following parameters:
   model_1 = [p10, p11, p12] \
   model_2 = [p20, p21, p22]  |
   model_3 = [p30, p31, p32]   > param_matrix 
   model_4 = [p40, p41, p42]  |
   model_5 = [p50, p51, p52] /

What about model_final: [pf0, pf1, pf2]?

Too trivial solution 1: model_final = mean(param_matrix, axis=0)

Too trivial solution 2: model_final = the one of the fives that reach the highest performance (could be a overfit rather than the optimal one)

Foolvio
  • 15
  • 3
  • In my opinion this question is better suited to [Cross Validated](https://stats.stackexchange.com/), since it's not a programming question. – Matt Hall Jun 14 '22 at 14:55
  • yeah sorry, never thought it exists. Anyway I would see if there is any property of sklearn classes that contains the "model_final" – Foolvio Jun 14 '22 at 15:24

1 Answers1

2

First of all, the purpose of cross-validation (K-fold) is model checking, not model building.

In your question, you said that every fold of your program has different parameters, maybe this is not the best way to work.

One possibility to proceed is evaluate every model (each one with different parameters) using K-fold inside (using GridSearchCV); if you see that you obtain similar values of accuracy or other metrics in each split, then you are not overfitting. Make this methodology for every model you have, and chose the one you obtain better results. Of course, always there is possibility to overfit, but with K-fold, you reduce it.

Finally, once you have checked with cross-validation that you obtain similar metrics for every split and you have chosed the model parameters, you have to train your model with all your training data; and you will finally obtain one unique model.

Alex Serra Marrugat
  • 1,849
  • 1
  • 4
  • 14
  • I like most of your answer, but don't quite agree with the second sentence... the model will have different parameters in each fold. Maybe you could also mention the nested cross-validation strategy? – Matt Hall Jun 14 '22 at 14:57
  • no wait... models have different parameters because they obtain different results. With "parameters" here I mean the most low level ones: for instance, if you want to trace a boundary straight line in a two dimensional domain you'll have the common offset and angle parameters (y = m*x + q). Each of the five models will have different ms and qs. You are talking about high level parameters instead (like kernel function or number of poles). – Foolvio Jun 14 '22 at 15:17
  • anyway your last sentence convinces me more. Once I understood which is the best/more stable model/high level parameters combination, I can actually proceed with the final training feeding with all my data. Sounds pretty definitive – Foolvio Jun 14 '22 at 15:27