0

I need to build a data pipeline which takes input from a CSV file (stored on S3) and "updates" records in the Aurora RDS table. I understand the standard format (out of the box template) for bulk record insertion, but for the records update or deletion, is there any standard way to have those statements in the SqlActivity?

I can write an update statement, but then the way CSV inputs are referenced, they are just question marks (?) without any liberty to index a column.

Let me know if data pipeline can be used in this way? If yes any specific way I can refer CSV columns? Thanks in advance!

Atul
  • 125
  • 1
  • 2
  • 6

1 Answers1

0

You will need to do some preprocessing of your CSV to a SQL script containing your bulk updates and then invoke the SqlActivity with a reference to your script.

If you have inserts you might be able to perform this by using the following:

CopyActivity (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-copyactivity.html) which takes:

  • S3DataNode as an input

  • SqlDataNode as the output.

If performance is not a concern then this is the closest you can get to an out of the box transport using AWS Data Pipeline.

You can refer to the AWS Data Pipeline docs (http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/what-is-datapipeline.html) for more information.

Supratik
  • 66
  • 1
  • 2