2

I have the following architecture

enter image description here

I followed this link to the T, https://github.com/aws/amazon-sagemaker-examples/blob/main/step-functions-data-science-sdk/automate_model_retraining_workflow/automate_model_retraining_workflow.ipynb. I am not sure how to debug to see what is going wrong. Any suggestions would be appreciated.

To provide more context, this is a machine learning deployment project. What I am doing in the picture is chaining processes together. The "Query Training Results" part is a Lambda function that pulls the training metrics data from an S3 location. For some reason this part gets cancelled.

From what I found online (Why would a step function cancels itself when there are no errors), “this happens in step functions when you have a Choice state, and the Variable you are referencing is not actually in the state input.” There is also some answers in that post that suggest that the dictionary metrics need to be of string type which I made sure I casted it as such.

The problem I am having is when you click on that grey box it provides no information other than the fact that it was cancelled, so I have no clue what is going wrong.

justanewb
  • 133
  • 4
  • 15

0 Answers0