AWS Data Pipeline - How to set global pipeline variable from ShellCommandActivity

Question

I am trying to augment my pipeline (migrates data from RDS to RedShift) so that it selects all rows whose id is greater than the maximum id that exists in RedShift. I have a script in Python that calculates this value and returns it to the output. I want to take this output and save it to a variable max_id that I can later reference in my RDS selection query. For example, my RDS selection section currently looks like this:

{
  "database": {
    "ref": "rds_mysql"
  },
  "scheduleType": "TIMESERIES",
  "name": "SrcRDSTable",
  "id": "SrcRDSTable",
  "type": "SqlDataNode",
  "table": "#{myRDSTableName}",
  "selectQuery": "select * from #{table} where #{myRDSTableLastModifiedCol} > '#{max_id}'"
},

I then want to add a section before this that would execute the bash script, retrieve the id field and save it to the variable max_id so that it can be referenced in the above code. So far I have:

{
 "myComment": "Retrieves the maximum ID for a given table in RedShift",
  "id": "ShellCommandActivity_Max_ID",
  "workerGroup": "wg-12345",
  "type": "ShellCommandActivity",
  "command": "starting_point=$(/usr/bin/python /home/user/aws-taskrunner-docker/get_id.py --schema=schema_name --table=users --database=master)"
},

How can I adjust the above to set max_id to the value of starting_point? Thanks.

score 1 · Answer 1 · answered Sep 06 '16 at 19:17

Unfortunately, I don't think that there's a way to set a pipeline parameter during pipeline execution. Here are a couple options that may help you.

First, if your data table has a column for modification date, then you can use the pipeline template Incremental Copy of RDS MySQL to Redshift. If you're not using MySQL, you may still be able to modify that template to your needs.

Alternatively, instead of using the SqlDataNode, you could create a ShellCommandActivity that uses python to connect to your RDS database and exports the relevant record set to S3. You could then import the records from S3 using the RedshiftCopyActivity.

AWS Data Pipeline - How to set global pipeline variable from ShellCommandActivity

1 Answers1