I'm practising machine learning algorithms (Lasso regression and decision trees) using Sklearn.pipeline and Sklearn.model_selection.GridSearchCV. I have split my dataset into training and test set. The following is my code. I wanted to know if my application was correct.
X_train_lasso_r, X_test_lasso_r, y_train_lasso_r, y_test_lasso_r = train_test_split(X, y, test_size = 0.3, random_state = 42)
print("The dimension of X_train_lasso_r is {}".format(X_train_lasso_r.shape))
print("The dimension of X_test_lasso_r is {}".format(X_test_lasso_r.shape))
pca = decomposition.PCA()
scaler = StandardScaler()
pipe = Pipeline(steps = [("scaler", scaler), ("pca", pca), ("lasso_regression", lasso_regression)])
n_components = list(range(1, X.shape[1]+1, 1)) # n_components = Number of components to keep after dimensionality reduction
normalize = [True, False]
selection = ["cyclic", "random"]
parameters = dict(pca__n_components=n_components, lasso_regression__normalize=normalize, lasso_regression__selection=selection)
clf = GridSearchCV(pipe, parameters)
clf.fit(X_train_lasso_r, y_train_lasso_r)
predictions_lasso_r_test = clf.best_estimator_.predict(X_test_lasso_r)
predictions_lasso_r_train = clf.best_estimator_.predict(X_train_lasso_r)
I am confused as to how to correctly apply the pipe to the test set. I would like some help in figuring out how to use GridSearchCV using pipelines and how to proceed to the prediction part. Thanks in advance!
I've googled but I'm not able to find a solution to how to proceed following the fitting of the train set.