How to restart an AWS Data Pipeline

Question

I have a scheduled AWS Data Pipeline that failed partway through its execution. I fixed the problem without modifying the Pipeline in any way (changed a script in S3). However, there seems to be no good way to restart the Pipeline from the beginning.

I tried Deactivating/Reactivating the Pipeline, but the previously "FINISHED" nodes were not restarted. This is expected; according to the docs, this only pauses and un-pauses execution of the Pipeline, which is not that we want.

I tried Rerunning one of the nodes (call it x) individually, but it did not respect dependencies: none of the nodes x depends on reran, nor did the nodes that depend on x.

I tried activating it from a time in the past, but received the error: startTimestamp should be later than any Schedule StartDateTime in the pipeline (Service: DataPipeline; Status Code: 400; Error Code: InvalidRequestException; Request ID: <SANITIZED>).

I would rather not change the Schedule node, since I want the Pipeline to continue to respect it; I only need this one manual execution. How can I restart the Pipeline from the beginning, once?

score 3 · Answer 1 · answered Jul 25 '16 at 18:31

3

So far, the best way to accomplish this that I've found is to Clone the Pipeline, make it On-Demand (instead of Scheduled) and activate that one. This new Pipeline will activate and run immediately. This seems cumbersome, however; I'd be happy to hear a better way.

answered Jul 25 '16 at 18:31

Simon Lepkin

1,021
1
13
25

2

I find this to be true most of the time also. A failed pipeline execution, even when set to 'On Demand`, is insanely finicky to get to rerun. Usually after trying to no avail for an hour, I end up cloning it. – Brett Green Aug 28 '18 at 13:33
Almost 3 years have gone by since this post, and this issue is still a problem in AWS DataPipeline... – jmng Jun 05 '19 at 14:55

score 1 · Answer 2 · answered Jul 25 '16 at 18:13

1

The ActivatePipeline API has a startTimestamp parameter using which you can restart execution from any previous time interval. Please see http://docs.aws.amazon.com/datapipeline/latest/APIReference/API_ActivatePipeline.html

answered Jul 25 '16 at 18:13

Ramkumar K Sugavanam

56
1

That hasn't been working for me. When I select Actions->Activate, and select a previous time, I get the error: `startTimestamp should be later than any Schedule StartDateTime in the pipeline (Service: DataPipeline; Status Code: 400; Error Code: InvalidRequestException; Request ID: )`. On the other hand, when I activate it "from now", no errors occur, but neither does the pipeline run. I'll add this to the question. – Simon Lepkin Jul 25 '16 at 18:28
1

Your pipeline definition has a startDateTime in the schedule object which defines the earliest execution. The startTimestamp paramter for the activate API cannot be earlier than the startDatetime i.e you can only re-run previous dates that were already run. If you want to go back in time to process data for dates which your pipeline has never run before, you have to clone the pipeline and create a new one unfortunately. – Ramkumar K Sugavanam Jul 26 '16 at 19:12

How to restart an AWS Data Pipeline

2 Answers2