I have a model whose training accuracy is 95-100 % and I believe there is overfitting. So, I want to avoid overfitting in my model. One way to avoid overfitting is to do k-fold cross-validation. So, while performing cross-validation there are several results for each iteration. So, how to choose the best result from different results and predict unseen data?
from sklearn.model_selection import train_test_split
train_features, test_features, train_labels, test_labels = train_test_split(features, labels, test_size = 0.25, random_state = 42)
from sklearn.ensemble import RandomForestClassification
rf = RandomForestClassification(random_state = 42)
rf.fit(train_features, train_labels)
predictions = rf.predict(test_features)
Cross-validation sample from sklearn is
from sklearn.model_selection import cross_val_score
clf = RandomForestClassification(random_state = 42)
scores = cross_val_score(clf, X, y, cv=5)