Sagemaker Model Monitoring: Model Quality Monitoring baseline job too much data for max payload size

Question

I'm currently using Sagemaker Model Monitoring modules in order to check model and data quality. I create the data and model baselines with the sagemaker python API. While running the Batch Transform Job launched by the suggest_baseline() function with the model quality monitor, I got this error:

2023-07-28T15:04:37.163:[sagemaker logs]: MaxConcurrentTransforms=1, MaxPayloadInMB=6, BatchStrategy=MULTI_RECORD
2023-07-28T15:04:38.084:[sagemaker logs]: X.csv: Too much data for max payload size

As we can see MaxPayloadInMB=6 so one change could be to use a bigger MaxPayloadInMB size. The problem is that that parameter is not configurable through the sagemaker model monitoring python API.

How can I deal with this problem?

Update:

This code below is executed into a Lambda function in order to setup a model monitor for the newly deployed model:


model_quality_monitor = ModelQualityMonitor(
        role=role,
        instance_count=1,
        instance_type="ml.m5.xlarge",
        volume_size_in_gb=20,
        max_runtime_in_seconds=1800,
    )

    logger.info("Suggesting baseline")
    model_quality_monitor.suggest_baseline(
        baseline_dataset=baseline_dataset_uri,
        dataset_format=DatasetFormat.csv(header=True),
        output_s3_uri=(
            f"s3://{bucket}/{endpoint_name}/{schema_version}/model-quality/baseline/results"
        ),
        problem_type="BinaryClassification",
        inference_attribute="prediction",
        probability_attribute="probability",
        ground_truth_attribute="label",
        wait=True,
        logs=True,
    )

    logger.info("Creating model quality scheduler")
    endpoint_input = EndpointInput(
        endpoint_name=endpoint_name,
        probability_attribute="0",
        probability_threshold_attribute=0.5,
        destination="/opt/ml/processing/input_data",
    )

    results_uri = (
        f"s3://{bucket}/{endpoint_name}/{schema_version}/model-quality/results"
    )

    schedule_name = f"{endpoint_name}-model-quality-monitor-schedule"
    model_quality_monitor.create_monitoring_schedule(
        monitor_schedule_name=schedule_name,
        endpoint_input=endpoint_input,
        output_s3_uri=results_uri,
        problem_type="BinaryClassification",
        ground_truth_input=f"s3://{bucket}/{endpoint_name}/{schema_version}/ground-truth",
        constraints=model_quality_monitor.latest_baselining_job.suggested_constraints(),
        schedule_cron_expression=CronExpressionGenerator.hourly(),
        enable_cloudwatch_metrics=True,
    )

    model_quality_monitor.describe_schedule()

Can you post monitoring job function where you are utlizing the parameter? — yashaswi k, Jul 30 '23 at 04:08
@yashaswik I updated the question below with the code is executed in order to configure and setup my Sagemaker Model Monitor. — mxmrpn, Jul 30 '23 at 14:04
mentioned error occurs when sagemaker Transformer function is run , can u confirm whether you are executing transformer function simultaneously? — yashaswi k, Jul 30 '23 at 15:10
@yashaswik I don't explicitly trigger any Transformer function. I think that the Transformer function is triggered automatically by the suggest baseline function. — mxmrpn, Aug 01 '23 at 21:29

Sagemaker Model Monitoring: Model Quality Monitoring baseline job too much data for max payload size

0 Answers0