I would like to check if I'm missing any important points here.
My pipeline is only for Featurization. I understand that once a pipeline that includes an Estimator is fitted; then saving the pipeline will persist the params the Estimator has learned. So loading a saved pipeline in this case means not having to re-train the Estimator; which is a huge point.
However; for the case of a pipeline which only consists of a number of Transform stages; would I always get the same result on feature extraction from a input dataset using either of the below two approaches?
1)
- Creating a pipeline with a certain set of stages; and configuration per stage.
- Saving and reloading the pipeline.
- Transforming an input dataset
versus
2)
- Each time just instantiating a new pipeline (of course with the exact same set of stages; and configuration per stage)
- Transforming the input dataset
So; alternative phrasing would be; as long as the exact set of stages; and configuration per stage is known; a Featurization pipeline can be efficiently (because there is no 'training an estimator' phase) recreated without using save or load?
Thanks, Brent