0

When running my automl pipeline, i am consistently getting an error during the MetricsAndSaveModel activity which causes my model training run to fail:

2019-12-06 22:48:01,233 - INFO - 295 : ActivityCompleted: Activity=MetricsAndSaveModel, HowEnded=Failure, Duration=1200977.92[ms]
2019-12-06 22:48:01,235 - CRITICAL - 295 : Type: Unclassified
Class: AzureMLException
Message: AzureMLException:
    Message: Failed to flush task queue within 1200 seconds
    InnerException None
    ErrorResponse 
{
    "error": {
        "message": "Failed to flush task queue within 1200 seconds"
    }
}
Traceback:
  File "fit_pipeline.py", line 222, in fit_pipeline
    automl_run_context.batch_save_artifacts(strs_to_save, models_to_upload)
  File "automl_run_context.py", line 201, in batch_save_artifacts
    timeout_seconds=ARTIFACT_UPLOAD_TIMEOUT_SECONDS)
  File "run.py", line 49, in wrapped
    return func(self, *args, **kwargs)
  File "run.py", line 1824, in upload_files
    timeout_seconds=timeout_seconds)
  File "artifacts_client.py", line 167, in upload_files
    results.append(task)
  File "task_queue.py", line 53, in __exit__
    self.flush(self.identity)
  File "task_queue.py", line 126, in flush
    raise AzureMLException("Failed to flush task queue within {} seconds".format(timeout_seconds))
kevinzurek
  • 31
  • 6

1 Answers1

0

The current timeout limit is set to 20 minutes in the AutoML service and our product team is working to provide this as a configurable setting in future releases. Currently, to increase this limit you can modify the script automl_run_context.py to update ARTIFACT_UPLOAD_TIMEOUT_SECONDS to a higher value and retry running the pipeline.

RohitMungi
  • 124
  • 6