Use the name of the table from Amazon RDS in the output csv being sent to S3

Question

I successfully managed to get a data pipeline to transfer data from a set of tables in Amazon RDS (Aurora) to a set of .csv files in S3 with a "copyActivity" connecting the two DataNodes.

However, I'd like the .csv file to have the name of the table (or view) that it came from. I can't quite figure out how to do this. I think the best approach is to use an expression the filePath parameter of the S3 DataNode.

But, I've tried #{table}, #{node.table}, #{parent.table}, and a variety of combinations of node.id and parent.name without success.

Here's a couple of JSON snippets from my pipeline:

"database": {
    "ref": "DatabaseId_abc123"
  },
  "name": "Foo",
  "id": "DataNodeId_xyz321",
  "type": "MySqlDataNode",
  "table": "table_foo",
  "selectQuery": "select * from #{table}"
},
{
  "schedule": {
    "ref": "DefaultSchedule"
  },
  "filePath": "#{myOutputS3Loc}/#{parent.node.table.help.me.here}.csv",
  "name": "S3_BAR_Bucket",
  "id": "DataNodeId_w7x8y9",
  "type": "S3DataNode"
}

Any advice you can provide would be appreciated.

score 0 · Answer 1 · answered Nov 09 '15 at 22:19

0

I see that you have #{table} (did you mean #{myTable}?). If you are using a parameter to pass the name of the DB table, you can use that in the S3 filepath as well like this:

"filePath": "#{myOutputS3Loc}/#{myTable}.csv",

answered Nov 09 '15 at 22:19

Austin Lee

84
2

thanks. No, looks like I haven't been clear, sorry. The name of the DB tables are hard coded in the data nodes (I have 4 source nodes). That gives me an idea though, I wonder if there is a way to use separate "my" parameters for the data nodes and then individually reference them in the output data node? – D. Woods Nov 12 '15 at 02:14

Use the name of the table from Amazon RDS in the output csv being sent to S3

1 Answers1