1

I would like to check if I'm missing any important points here.

My pipeline is only for Featurization. I understand that once a pipeline that includes an Estimator is fitted; then saving the pipeline will persist the params the Estimator has learned. So loading a saved pipeline in this case means not having to re-train the Estimator; which is a huge point.

However; for the case of a pipeline which only consists of a number of Transform stages; would I always get the same result on feature extraction from a input dataset using either of the below two approaches?

1)

  1. Creating a pipeline with a certain set of stages; and configuration per stage.
  2. Saving and reloading the pipeline.
  3. Transforming an input dataset

versus

2)

  1. Each time just instantiating a new pipeline (of course with the exact same set of stages; and configuration per stage)
  2. Transforming the input dataset

So; alternative phrasing would be; as long as the exact set of stages; and configuration per stage is known; a Featurization pipeline can be efficiently (because there is no 'training an estimator' phase) recreated without using save or load?

Thanks, Brent

brent
  • 1,095
  • 1
  • 11
  • 27

0 Answers0