0

My final goal is to perform data transformation using existing machine with preinstalled software - more exactly the software is an R script that uses non standard packages [possibly installed manually] - so I would rather like to start existing (stopped) instance than creating the plain one from scratch for the time of data pipeline running. Ideally I would like to stop the instance after the work is done. Does Data Pipeline API offer anything close to that?

John Rotenstein
  • 241,921
  • 22
  • 380
  • 470

1 Answers1

1

I think this is feasible. First you have to create a shell command activity in a pipeline that starts the EC2 instance through AWS CLI. This activity may have to poll to retrieve the status of the instance and returns after it starts.

The EC2 instance will also need to start a task runner as part of its init.

Then the next step of the pipeline would be run using that task runner. After this step, another DPL shell activity could shutdown the system.

While it is feasible, I think you have a simpler topology if you are able to install your software as part of a Shell Activity and use DPL managed resources completely as opposed to have a separate instance that you manage with a task runner..

user1452132
  • 1,758
  • 11
  • 21