1

I have a problem using SageMaker pipeline for MLOps, I have followed this example, they seems to have only example of one time deployment, my project requires to retrain model weekly, and it will be error if retrain and deploy the model again, I check on AWS document too, I cannot find any example to update model version of running endpoint, my workaround is to delete and recreate the endpoint again, but it will cause down-time

Any suggested solution to update new model without downtime?

Here is my code below :

scheduler code:


    sklearn_preprocessor = SKLearn(
                entry_point=script_path,
                role=role,
                framework_version="0.23-1",
                base_job_name="test-model",
                instance_type=env.TRAIN_INSTANCE_TYPE,
                sagemaker_session=sagemaker_session,
            )
    
            train_step = TrainingStep(
                name="TrainingStep",
                display_name="Traning Step",
                estimator=sklearn_preprocessor,
                inputs={"train": train_input},
            )
    
            model = Model(
                image_uri=sklearn_preprocessor.image_uri,
                model_data=train_step.properties.ModelArtifacts.S3ModelArtifacts,  # pylint: disable=no-member
                sagemaker_session=sagemaker_session,
                role=role,
                name="test-model",
            )
    
            step_register_pipeline_model = RegisterModel(
                name="RegisterModelStep",
                display_name="Register Model Step",
                model=model,
                content_types=["text/csv"],
                response_types=["text/csv"],
                inference_instances=[env.TRAIN_INSTANCE_TYPE],
                transform_instances=[env.INFERENCE_INSTANCE_TYPE],
                model_package_group_name="test-model-group",
                approval_status="Approved",
            )
    
            inputs = CreateModelInput(
                instance_type=env.INFERENCE_INSTANCE_TYPE,
            )
    
            step_create_model = CreateModelStep(
                name="CreateModelStep", display_name="Create Model Step", model=model, inputs=inputs
            )
    
            lambda_fn = Lambda(
                function_arn="arn:aws:lambda:ap-southeast-1:xxx:function:model-deployment"
            )
    
            step_deploy_lambda = LambdaStep(
                name="DeploymentStep",
                display_name="Deployment Step",
                lambda_func=lambda_fn,
                inputs={
                    "model_name": "test-model",
                    "endpoint_config_name": "test-model",
                    "endpoint_name": "test-endpoint",
                    "model_package_arn": step_register_pipeline_model.steps[
                        0
                    ].properties.ModelPackageArn,
                    "role": "arn:aws:iam::xxx:role/service-role/xxxx-role"
                },
            )
    
            pipeline = Pipeline(
                name="sagemaker-pipeline",
                steps=[train_step, step_register_pipeline_model, step_deploy_lambda],
            )
            pipeline.upsert(
                role_arn="arn:aws:iam::xxx:role/service-role/xxxx-role"
            )
            pipeline.start()

lambda function for deployment:

import json
import boto3

def lambda_handler(event, context):
    model_name = event["model_name"]
    model_package_arn = event["model_package_arn"]
    endpoint_config_name = event["endpoint_config_name"]
    endpoint_name = event["endpoint_name"]
    role = event["role"]
    
    sm_client = boto3.client("sagemaker")
    container = {"ModelPackageName": model_package_arn}
    create_model_respose = sm_client.create_model(ModelName=model_name, ExecutionRoleArn=role, Containers=[container] )

    create_endpoint_config_response = sm_client.create_endpoint_config(
        EndpointConfigName=endpoint_config_name,
        ProductionVariants=[
            {
                "InstanceType": "ml.m5.xlarge",
                "InitialInstanceCount": 1,
                "ModelName": model_name,
                "VariantName": "AllTraffic",
            }
        ]
    )

    create_endpoint_response = sm_client.create_endpoint(EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name)


    return {
        'statusCode': 200,
        'body': json.dumps('Done!')
    }
Vengleab SO
  • 716
  • 4
  • 11

1 Answers1

0

You can update the Lambda code to "update_endpoint" instead of creating it. You can add a check in the code to see if an endpoint already exists, and if it does, call update endpoint instead of create.

Kirit Thadaka
  • 429
  • 2
  • 5
  • I am checking it too, but the SDK does not provide api to check whether the endpoint exists or not – Vengleab SO Apr 01 '22 at 16:35
  • You could use DescribeEndpoint or ListEndpoints APIs – timxyz Apr 03 '22 at 14:06
  • Maybe going with the EAFP principle and trying to create the endpoint and treating the error. I usually do this if, in the end, the endpoint is going to be created or updated. So, try to create an endpoint, and if you get a 400 error saying that it exists you invoke the update process. – guimorg Jun 07 '22 at 21:41