0

I have two tables source and target in the different databases. I need to insert new records in the target from a source, update the existing records and delete the records from the target table. I need to achieve these using the AWS glue python script. Which transformation do I need to use?

Pradnya
  • 21
  • 2

1 Answers1

0

You have not mentioned what is the source & target database that you are going to use. I have done UPSERTS to an Aurora Mysql database as Target, and S3 as my Source.

I have used "INSERT INTO TABLE....ON DUPLICATE KEY update...". This will check if the record already exists in the target database (based on the PRIMARY KEYs), and if exists, the records are UPDATED else the record is inserted.

Let me know if you need any help with sample. You can check my other SO question/answers on the same. You have to use a python database connector library as a separate zip file to your Glue job.

Yuva
  • 2,831
  • 7
  • 36
  • 60
  • Thank you for your reply. I am using on-premise SQL server for source and Amazon Redshift for a target. If you can help me with the sample that will very helpful. – Pradnya Sep 18 '18 at 06:46
  • Ok, for sql server, think you should use pyodbc or pymssql zip file. Upload the bundled zip file to a S3 location, and when creating a job add this external jar as a python jar file. For the code part, think sql server supports MERGE in lieu of INSERT INTO TABLE ....ON DUPLICATE KEY. Please refer to this link for MERGE sample in SQL SERVER : https://stackoverflow.com/questions/50135898/multi-row-upsert-insert-or-update-from-python. Am not sure if REDSHIFT supports any UPSERTs, let me check. – Yuva Sep 18 '18 at 06:55
  • Redshift has no UPSERT command. Instead merge command their, but it requires a staging table. So it's not that simple like UPSERT... check this out: https://docs.aws.amazon.com/redshift/latest/dg/c_best-practices-upsert.html – jelongpark Jun 07 '21 at 21:14