how to convert jupyter notebook training code to sagemaker pipeline steps?

Question

I'm new to sagemaker pipeline, doing some reasearch on how can i train models not just in jupyter notebook but I want to set it up as a sagemaker pipeline in sagamaker studio. I tried and followed some examples based on blogs/docs provided here -> https://aws.amazon.com/blogs/machine-learning/hugging-face-on-amazon-sagemaker-bring-your-own-scripts-and-data/

and was able to run these steps in a jupyter notebook in sagemaker, but if i wanted to set up a sagemaker pipeline and create steps for training , how can i convert these to sagemaker pipeline steps, any examples or blogs doing similar in sagemaker pipeline/studio would be helpful?

score 1 · Answer 1 · answered Jan 15 '23 at 22:16

While there is a way to convert a jupyter notebook to python script, well this is not sufficient to convert code written in notebook context to pipeline context.

The only things that remain intact are scripts written as entry_point for training/inference/processing in general (barring any minor internal readjustments e.g. on used environment variables that may be present in a pipelined context differently).

This official guide seems to me the most complete to follow as a prerequisite: "Amazon SageMaker Model Building Pipeline"

Next you can see a fairly recurring application scenario, again in official guide: "Orchestrate Jobs to Train and Evaluate Models with Amazon SageMaker Pipelines"

By setting the name of your pipeline, once it is launched, you will see it as a graph with the various states running in SageMaker Studio.

You will probably have written data manipulation code within your notebook. Remember that the pipeline is composed of steps, so you cannot manipulate data between steps without understanding this step as a step itself (e.g. processing step). Perhaps therefore the biggest code change to readjust is this part.

how to convert jupyter notebook training code to sagemaker pipeline steps?

1 Answers1