I am trying to carry out hyperparameter tuning for a polynomial regressor. I am trying to end up with a plot that shows the degree of polynomial on the x-axis and the average RMSE (or R^2) on the y-axis but using kfold not using the classical X_train, X_test, y_train, and y_test. So for every polynomial degree, I want the RMSE (or R^2) to be the average RMSE (or R^2) for kfolds (let's say k=5).
The marked answer in this link (How to find the best degree of polynomials?) shows a plot similar to what I am looking for but the guy has used the classical splitting method.
e.g., I want to plot:
polynomial degree=1 Vs. Average R^2 for all the folds in degree 1= 0.756432270055037
polynomial degree=2 Vs. Average R^2 for all the folds in degree 1= 0.7674777367903888
etc....
Code:
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import KFold
from sklearn.metrics import r2_score
crossvalidation_poly = KFold(n_splits=10, shuffle=True)
for i in range(1,11):
poly_cross_validation = PolynomialFeatures(degree=i)
X_current = poly.fit_transform(X_normalized)
model = lin_regressor.fit(X_current, y_for_normalized)
scores = cross_val_score(model, X_current,y_for_normalized, scoring='r2', cv=crossvalidation_poly,
n_jobs=1)
print("\n\nDegree-"+str(i)+" polynomial: R^2 for every fold: " + str(np.abs(scores)))
print("Degree-"+str(i)+" polynomial: Average R^2 for all the folds: " + str(np.mean(np.abs(scores))) + ", STD: " + str(np.std(scores)))
Degree-1 polynomial: R^2 for every fold: [0.70885059 0.68423204 0.68656988 0.47465932 0.77245533 0.77660144 0.8062222 0.77522948 0.94399142 0.93551101] Degree-1 polynomial: Average R^2 for all the folds: 0.756432270055037, STD: 0.12747438075977152
Degree-2 polynomial: R^2 for every fold: [0.60231996 0.77451333 0.79902791 0.76726714 0.97875956 0.96190752 0.79539439 0.71274774 0.76434911 0.51849071] Degree-2 polynomial: Average R^2 for all the folds: 0.7674777367903888, STD: 0.13286987725882402
Degree-3 polynomial: R^2 for every fold: [0.75006208 0.81559452 0.80957158 0.93696435 0.93268663 0.34020557 0.76799658 0.77683065 0.94776599 0.81086865] Degree-3 polynomial: Average R^2 for all the folds: 0.7888546595399009, STD: 0.16518966691934225
Degree-4 polynomial: R^2 for every fold: [0.54533102 0.97378695 0.7740047 0.28481531 0.64100617 0.96639657 0.95132567 0.55632614 0.76808545 0.84447403] Degree-4 polynomial: Average R^2 for all the folds: 0.7305552024814645, STD: 0.21240601657201713
Degree-5 polynomial: R^2 for every fold: [0.72555654 0.81780613 0.72069331 0.89983065 0.69517235 0.91183736 0.8498918 0.10670107 0.56018748 0.36193799] Degree-5 polynomial: Average R^2 for all the folds: 0.6649614685801433, STD: 0.24409382494665008
Degree-6 polynomial: R^2 for every fold: [0.52701619 0.81438492 0.9185306 0.93914815 0.70423433 0.42755771 0.75460921 0.8159036 0.97007572 0.77829732] Degree-6 polynomial: Average R^2 for all the folds: 0.7649757756467411, STD: 0.16597173301214535
Degree-7 polynomial: R^2 for every fold: [0.74668074 0.72270667 0.91650098 0.80393617 0.90252636 0.87143124 0.74451664 0.93447347 0.80355377 0.64355457] Degree-7 polynomial: Average R^2 for all the folds: 0.808988062085028, STD: 0.0910278613354405
Degree-8 polynomial: R^2 for every fold: [0.76413023 0.54156903 0.13845794 0.95575456 0.86352912 0.63582552 0.57776507 0.74772199 0.95960339 0.79973003] Degree-8 polynomial: Average R^2 for all the folds: 0.6984086868251935, STD: 0.23137556481604366
Degree-9 polynomial: R^2 for every fold: [0.95689635 0.83530443 0.75732956 0.76130747 0.74536009 0.76416788 0.96582701 0.49744385 0.93464666 0.76915906] Degree-9 polynomial: Average R^2 for all the folds: 0.7987442348654347, STD: 0.13097868143250385
Degree-10 polynomial: R^2 for every fold: [0.7537919 0.97366532 0.46433629 0.73035432 0.21057854 0.94347551 0.6700831 0.96217685 0.80456772 0.16903883] Degree-10 polynomial: Average R^2 for all the folds: 0.668206840072372, STD: 0.2802564156732779