how to run only training step in a sagemaker pipeline?

Question

based on the docs, we can create different steps and chain them together in sagemaker pipeline, but I am wondering, if i wanted to just run one training step, without processing step , like in the example below, will i able to pass a s3 location as an argument , instead of the output from previous step , i.e. step_process. or in other words , how can i pass a s3 location uri instead of => step_process.properties.ProcessingOutputConfig.Outputs["train"].S3Output.S3Uri

    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,
            content_type="text/csv"
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "validation"
            ].S3Output.S3Uri,
            content_type="text/csv"
        )
    }

from sagemaker.workflow.pipeline_context import PipelineSession

from sagemaker.inputs import TrainingInput
from sagemaker.workflow.steps import TrainingStep

from sagemaker.xgboost.estimator import XGBoost

pipeline_session = PipelineSession()

xgb_estimator = XGBoost(..., sagemaker_session=pipeline_session)

step_args = xgb_estimator.fit(
    inputs={
        "train": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "train"
            ].S3Output.S3Uri,
            content_type="text/csv"
        ),
        "validation": TrainingInput(
            s3_data=step_process.properties.ProcessingOutputConfig.Outputs[
                "validation"
            ].S3Output.S3Uri,
            content_type="text/csv"
        )
    }
)

step_train = TrainingStep(
    name="TrainAbaloneModel",
    step_args=step_args,
)

score 1 · Answer 1 · answered Jan 18 '23 at 15:00

The pipeline, in the context of Amazon SageMaker, refers to a set of interconnected (and not) steps. See 'SageMaker Pipelines Overview'

If you define a pipeline that way, you cannot change its runtime configuration. For example, if you define that a training step requires preprocessing data, you cannot change this option.

The only possible solution is to create a different pipeline definition script (cleaner and safer solution, to avoid regressions or various errors) or clearly replace the code where you indicate step elements you do not want to use.

P.S.: Clearly you will have to:

Remove the step from the list of steps to be executed in the pipeline definition:

steps=[step_process, step_train] become steps=[step_train]
Remove any dependencies between steps.

E.g. step_train.add_depends_on([step_process])

how to run only training step in a sagemaker pipeline?

1 Answers1