Once I iterated on each training combination, given the k-fold split, I can estimate mean and standard deviation of models performance but I actually get k different models (with their own fitted parameters). How do I get the final, whole model? Is a matter of averaging the parameters?
Not showing code because is a general question so I'll write down the logic only:
- dataset
- splitting dataset according to the k-fold theory (let's say k = 5)
- iterations: training from the first to the fifth model
- getting 5 different models with, let's say, the following parameters:
model_1 = [p10, p11, p12] \
model_2 = [p20, p21, p22] |
model_3 = [p30, p31, p32] > param_matrix
model_4 = [p40, p41, p42] |
model_5 = [p50, p51, p52] /
What about model_final: [pf0, pf1, pf2]?
Too trivial solution 1: model_final = mean(param_matrix, axis=0)
Too trivial solution 2: model_final = the one of the fives that reach the highest performance (could be a overfit rather than the optimal one)