0

I am working on deploying a full ML pipeline for SageMaker and Airflow. I would like to separate training and processing part of the pipeline.

I have a question concerning the SageMakerProcessingOperator(source_code). This operator relies on create_processing_job() function. When using this operator, I would like to extend the base docker image used for processing in order to use an home-made script. Currently, the processing works fine when I push my container to aws ECR. However, I would prefer to use a part of the script stored inside my packaged model (with tar.gz format).

For training and registering the model, we can specify the image used to extend with sagemaker_submit_directory and SAGEMAKER_PROGRAM env variable (cf aws_doc). However it looks like it is not possible using the SageMakerProcessingOperator. Below is a extract of the config used in the operator, with no success yet.

"Environment": {
    "sagemaker_enable_cloudwatch_metrics": "false",
    "SAGEMAKER_CONTAINER_LOG_LEVEL": "20",
    "SAGEMAKER_REGION": f"{self.region_name}",
    "SAGEMAKER_SUBMIT_DIRECTORY": f"{self.train_code_path}",
    "SAGEMAKER_PROGRAM": f"{self.processing_entry_point}",
    "sagemaker_job_name": f"{self.process_job_name}",
},

Did anyone manage to use these parameters for Sagemaker create_processing_job() ? Or is it only limited to AWS ECR ?

Victor
  • 23
  • 6

1 Answers1

0

SageMaker Processing Job and SageMaker training job are different so the underlying architecture is different and we cannot combine both of them.

CrzyFella
  • 191
  • 4
  • It is possible to use the same source code for both SageMakerTrainingOperator and SageMakerModelOperator even though the underlying archi is different, but point into different entrypoints. – Victor May 03 '22 at 09:04
  • I have never tried but I think it is not possible. You will be running those jobs in different instances. – CrzyFella May 04 '22 at 01:52
  • The job are running with different instances but look up into the same model zipped to fetch the code. You can specify for training to run the function train and another function for modelOperator. For more information, you can check the links i've shared in the original question. I can assure you that this is possible and working – Victor May 04 '22 at 09:46