1

My understanding is that in order to compare different trials of a pipeline (see image), the metrics can only be obtained from the TrainingStep, using the metric_definitions argument for an Estimator.

screenshot

In my pipeline, I extract metrics in the evaluation step that follows the training. Is it possible to record there metrics that are then tracked for each trial?

duff18
  • 672
  • 1
  • 6
  • 19

1 Answers1

0

SageMaker suggests using Property Files and JsonGet for each necessary step. This approach is suitable for using conditional steps within the pipeline, but also trivially for persisting results somewhere.

from sagemaker.workflow.properties import PropertyFile
from sagemaker.workflow.steps import ProcessingStep

evaluation_report = PropertyFile(
    name="EvaluationReport",
    output_name="evaluation",
    path="evaluation.json"
)

step_eval = ProcessingStep(
     # ...
     property_files=[evaluation_report]
)

and in your processor script:

import json

report_dict = {}  # your report
evaluation_path = "/opt/ml/processing/evaluation/evaluation.json"

with open(evaluation_path, "w") as f:
    f.write(json.dumps(report_dict))

You can read this file in the pipeline steps directly with JsonGet.

Giuseppe La Gualano
  • 1,491
  • 1
  • 4
  • 24
  • edited the post to explain better what I'm after – duff18 Nov 03 '22 at 16:46
  • In the processing components, there does not seem to be the possibility of defining metrics. A possible solution is to add a component made specifically for conducting analysis on the results. See ["Amazon SageMaker Clarify Detects Bias and Increases the Transparency of Machine Learning Models"](https://aws.amazon.com/it/blogs/aws/new-amazon-sagemaker-clarify-detects-bias-and-increases-the-transparency-of-machine-learning-models/). Personally, I follow the approach written in the answer and that is what AWS also recommends in the various example notebooks. – Giuseppe La Gualano Nov 03 '22 at 17:14