I'm building a Sagemaker batch inferencing pipeline and get confused about the options to process features (before inferencing) between using sagemaker.sklearn.processing.SKLearnProcessor
and sagemaker.sklearn.estimator.SKLearn
My understanding of these two options are:
There are docs from aws to use sagemaker.sklearn.estimator.SKLearn
to do the batch transformation to process the data.
The pros of using this class and its .create_model()
method is that I can incorporate the created model(to process the feature before inferencing) to sagemaker.pipeline.PipelineModel
which's deployed on endpoint. so the whole pipeline is behind a single endpoint to be called when inference request input in. This detailed from:
https://sagemaker-examples.readthedocs.io/en/latest/sagemaker-python-sdk/scikit_learn_inference_pipeline/Inference%20Pipeline%20with%20Scikit-learn%20and%20Linear%20Learner.html
I don't know the specific cons, and that's the first question (1).
However, if it's only for data processing, I can also use sagemaker.sklearn.processing.SKLearnProcessor
to create Sagemaker Processing jobs to process features, then dump to s3 for model to batch inferencing.
The pros to me is that it's making more sense to me to have a job that designed for processing, but cons is that it seems like I have to write a handler to pipeline the processing and inferencing myself, unlike the sagemaker.sklearn.estimator.SKLearn.
https://sagemaker-examples.readthedocs.io/en/latest/sagemaker_processing/scikit_learn_data_processing_and_model_evaluation/scikit_learn_data_processing_and_model_evaluation.html
So, my next question (2) is there a way to involve SKLearnProcessor in the sagemaker.pipeline.PipelineModel? if not, the following up question (3) is that if SKLearnProcessor is not designed for using in inferencing, what's the use case of it.
The final question (4) is that from efficiency perspective, what's pros and cons using each method in a Sagemaker batch inferencing pipeline?