I am trying to augment my pipeline (migrates data from RDS to RedShift) so that it selects all rows whose id
is greater than the maximum id
that exists in RedShift. I have a script in Python that calculates this value and returns it to the output. I want to take this output and save it to a variable max_id
that I can later reference in my RDS selection query. For example, my RDS selection section currently looks like this:
{
"database": {
"ref": "rds_mysql"
},
"scheduleType": "TIMESERIES",
"name": "SrcRDSTable",
"id": "SrcRDSTable",
"type": "SqlDataNode",
"table": "#{myRDSTableName}",
"selectQuery": "select * from #{table} where #{myRDSTableLastModifiedCol} > '#{max_id}'"
},
I then want to add a section before this that would execute the bash script, retrieve the id
field and save it to the variable max_id
so that it can be referenced in the above code. So far I have:
{
"myComment": "Retrieves the maximum ID for a given table in RedShift",
"id": "ShellCommandActivity_Max_ID",
"workerGroup": "wg-12345",
"type": "ShellCommandActivity",
"command": "starting_point=$(/usr/bin/python /home/user/aws-taskrunner-docker/get_id.py --schema=schema_name --table=users --database=master)"
},
How can I adjust the above to set max_id
to the value of starting_point
? Thanks.