Actually this doubt is more like -- "why is this code working properly?".
I was working out a problem from a text book. Specifically, the problem was to build a Pipeline that had a Data Preparation phase (remove NA values, perform Feature Scaling etc.) and then a Prediction phase, which involves a Predictor trained on the transformed dataset and returning its predictions.
Here, we used a Support Vector Regressor module (sklearn.svm.svr).
I tried some code of mine, but it didn't work. So I looked up the actual solution provided by the author of the textbook -
prepare_select_and_predict_pipeline = Pipeline([
('preparation', data_prep),
('svm_reg', SVR(kernel='rbf',C=30000,gamma='scale'))
])
prepare_select_and_predict_pipeline.fit(x_train,y_train)
some_data = x_train.iloc[:4]
print("Predictions for a subset of Training Set:",prepare_select_and_predict_pipeline.predict(some_data))
I tried this code, and it does work as expected. How can it work properly? My main objections are:
We have only fit the dataset, but where are we actually transforming it? We are not calling a transform() function anywhere...
Also, how can we use the predict() function with this pipeline? SVR might be a part of this pipeline, but so are the other transformers, and they don't have a predict() function.
Thanks in advance for your answers!