I have been trying to run an AWS datapipeline that calls a bash process that calls several long running python and java processes from a shell command activity. Each time the shell command activity runs, a reportProgress error is thrown in the Task Runner logs after exactly 5 days, and the task is cancelled. This problem persisted even after I set the attemptTimeout and LateAfterTimeout fields to longer than 5 days. The Task Runner log message and datapipeline json definition are shown below:
Screenshot of pipeline execution error
TASK RUNNER LOG MESSAGE:
01 Dec 2018 18:55:05,693 https://forums.aws.amazon.com/ (HeartBeatService-df-01341812NWJEQ1FAYI1K-@ShellCommandActivityId_UdTMC_2018-11-26T18:54:03_Attempt=1) amazonaws.datapipeline.taskrunner.HeartBeatService: HeartBeatService DataPipeline reportProgress error thrown and workCancelleddf-01341812NWJEQ1FAYI1K-@ShellCommandActivityId_UdTMC_2018-11-26T18:54:03_Attempt=1
amazonaws.datapipeline.taskrunner.CanceledTaskException: DataPipeline service requested this work be canceled.
at amazonaws.datapipeline.taskrunner.DataPipelineProgressReporter.reportProgress*(DataPipelineProgressReporter.java:31)
…
01 Dec 2018 18:55:06,726 https://forums.aws.amazon.com/ (TaskRunnerService-wg-10000-2) amazonaws.datapipeline.taskrunner.TaskPoller: Work ShellCommandActivity took 7201:0 to complete
PIPELINE JSON DEFINITION
{
"objects": [
{
"failureAndRerunMode": "CASCADE",
"resourceRole": "DataPipelineDefaultResourceRole",
"role": "DataPipelineDefaultRole",
"pipelineLogUri": "s3://oobhuntoo1/",
"scheduleType": "ONDEMAND",
"name": "Default",
"id": "Default"
},
{
"onLateAction": {
"ref": "ActionId_V6bq0"
},
"lateAfterTimeout": "7 Days",
"name": "DefaultShellCommandActivity1",
"id": "ShellCommandActivityId_UdTMC",
"workerGroup": "wg-10000",
"type": "ShellCommandActivity",
"command": "python ~/AWS_5day_Test/Python/Layer1.py"
},
{
"name": "DefaultAction1",
"id": "ActionId_V6bq0",
"type": "Terminate"
}
],
"parameters": []
}