3

I'm currently developing some ETL for my ML model with AWS. The thing is that I want to trigger a Lambda when some Sagemaker Processing Job is finished. And the event passed to the Lambda, should be the configuration info (job name, arguments, etc..) of the Sagemaker Processing Job.

Q1: How can I do to trigger the event when the Processing Job is finished?

Q2: How can I do to pass the Processing Job configurations as an event for the Lambda?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470
mxmrpn
  • 63
  • 8

1 Answers1

4

You can use the following EventBridge rule pattern:

{
  "source": ["aws.sagemaker"],
  "detail-type": ["SageMaker Processing Job State Change"],
  "detail": {
    "ProcessingJobStatus": ["Failed", "Completed", "Stopped"]
  }
}

The ProcessingJobStatus list can be modified based on which statuses you want to handle.

You can set a Lambda function as the target of your EventBridge rule.

Here is a sample event which will be passed to your Lambda, taken from AWS console:

{
  "version": "0",
  "id": "0a15f67d-aa23-0123-0123-01a23w89r01t",
  "detail-type": "SageMaker Processing Job State Change",
  "source": "aws.sagemaker",
  "account": "123456789012",
  "time": "2019-05-31T21:49:54Z",
  "region": "us-east-1",
  "resources": ["arn:aws:sagemaker:us-west-2:012345678987:processing-job/integ-test-analytics-algo-54ee3282-5899-4aa3-afc2-7ce1d02"],
  "detail": {
    "ProcessingInputs": [{
      "InputName": "InputName",
      "S3Input": {
        "S3Uri": "s3://input/s3/uri",
        "LocalPath": "/opt/ml/processing/input/local/path",
        "S3DataType": "MANIFEST_FILE",
        "S3InputMode": "PIPE",
        "S3DataDistributionType": "FULLYREPLICATED"
      }
    }],
    "ProcessingOutputConfig": {
      "Outputs": [{
        "OutputName": "OutputName",
        "S3Output": {
          "S3Uri": "s3://output/s3/uri",
          "LocalPath": "/opt/ml/processing/output/local/path",
          "S3UploadMode": "CONTINUOUS"
        }
      }],
      "KmsKeyId": "KmsKeyId"
    },
    "ProcessingJobName": "integ-test-analytics-algo-54ee3282-5899-4aa3-afc2-7ce1d02",
    "ProcessingResources": {
      "ClusterConfig": {
        "InstanceCount": 3,
        "InstanceType": "ml.c5.xlarge",
        "VolumeSizeInGB": 5,
        "VolumeKmsKeyId": "VolumeKmsKeyId"
      }
    },
    "StoppingCondition": {
      "MaxRuntimeInSeconds": 2000
    },
    "AppSpecification": {
      "ImageUri": "012345678901.dkr.ecr.us-west-2.amazonaws.com/processing-uri:latest"
    },
    "NetworkConfig": {
      "EnableInterContainerTrafficEncryption": true,
      "EnableNetworkIsolation": false,
      "VpcConfig": {
        "SecurityGroupIds": ["SecurityGroupId1", "SecurityGroupId2", "SecurityGroupId3"],
        "Subnets": ["Subnet1", "Subnet2"]
      }
    },
    "RoleArn": "arn:aws:iam::012345678987:role/SageMakerPowerUser",
    "ExperimentConfig": {},
    "ProcessingJobArn": "arn:aws:sagemaker:us-west-2:012345678987:processing-job/integ-test-analytics-algo-54ee3282-5899-4aa3-afc2-7ce1d02",
    "ProcessingJobStatus": "Completed",
    "LastModifiedTime": 1589879735000,
    "CreationTime": 1589879735000
  }
}

Edit:

If you want to match a ProcessingJobName with specific prefix:

{
  "source": ["aws.sagemaker"],
  "detail-type": ["SageMaker Processing Job State Change"],
  "detail": {
    "ProcessingJobStatus": ["Failed", "Completed", "Stopped"],
    "ProcessingJobName": [{
      "prefix": "standarize-data"
    }]
  }
}
Kaustubh Khavnekar
  • 2,553
  • 2
  • 14
  • Thanks for the answer. What if I need to filter those Processing Job for those who contain ? "ProcessingJobName": [""] Also, is it possible to filter just with the start of the name? Like if the name is -some-metadata – mxmrpn Feb 03 '22 at 19:07
  • https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-event-patterns.html This documentation covers some examples of different things you can do with pattern. Can you give an example of what type of keywords you want to match in what fields? – Kaustubh Khavnekar Feb 03 '22 at 19:10
  • This is the complete processing job name: standarize-data-1643826039-USA-2021-2022 So I want to match every Processing Job that starts with standarize-data – mxmrpn Feb 03 '22 at 20:00
  • @MaximoRipani I have updated my answer – Kaustubh Khavnekar Feb 03 '22 at 20:34
  • Thank you! It really helped me. – mxmrpn Feb 03 '22 at 23:10