0

I have a problem with my current data flow, where I try to delete entries, that already exist in my SQL DB. Based on the following thread, I am using a full outer join and an Alter Row step to get rid of duplicate entries: ADF copy data activity - check for duplicate records before inserting into SQL db

The only difference is, that the attribute _id is of the data type varchar.

enter image description here

I followed all the necessary steps:

enter image description here

And that is the preference of my AlterRow step:

enter image description here

Moreover, I am facing a strange problem, that I regularly have to set up the mapping of FixNames@{_id} as Azure detects it as an error. Setting it up again in the mapping part, the error is gone.

An altenative could be a stored procedure, but I don't understand, why the duplicates with my current version are still inserted.

Matzka
  • 125
  • 1
  • 13
  • If you wish to delete duplicates in your SQL DB, you should set a Delete policy in your Alter Row and set "Delete" as the only option in your sink. – Mark Kromer MSFT Sep 03 '20 at 18:56
  • I only don't want to insert duplicate entries. I am transfering data from a MongoDB via a Data Lake into the SQL DB. Since this process will be triggered once a day and the MongoDB data set is a full load, I only want to insert the new entries. – Matzka Sep 03 '20 at 20:23
  • A very simple mechanism would be to just "Upsert" all rows (set policy to true() in Alter Row) select only Upsert on the Sink and use the columns that determine uniqueness as the key column in your Sink. – Mark Kromer MSFT Sep 03 '20 at 21:05
  • If you want to filter down to just rows that are unique from incoming source to your destination, using the Exists transformation may be your simplest pattern here: https://learn.microsoft.com/en-us/azure/data-factory/data-flow-exists – Mark Kromer MSFT Sep 03 '20 at 21:14
  • @Matzka does the ID is int or integer data type? Please try to convert them to the same data type. – Leon Yue Sep 04 '20 at 00:41
  • @Leon: the ID is sadly an varchar/string. – Matzka Sep 04 '20 at 07:04
  • I still don't understand, why my described data flow does not work. – Matzka Sep 06 '20 at 20:45
  • @Matzka Please check the data preview in Join and Alter row, which active doesn't work as expected. – Leon Yue Sep 07 '20 at 00:52
  • I am also trying it with the exist-functionality. I even converted both _id-fields from source1 and source2 to String. I really don't get it.. – Matzka Sep 07 '20 at 12:32

0 Answers0