-1

I am working on a pipeline where I am using Excel as the source. The data has a primary key let's say Id, which is repeating multiple times in the Excel.

Now, when I insert it into a SQL database, it fails with the error:

java.sql.BatchUpdateException: Violation of PRIMARY KEY constraint. Cannot insert duplicate key in object 'dbo.xyz'. The duplicate key value is XXXX.

How can I make take care of this scenario using mapping data flows in ADF?

I am using a mapping data flow here to take care of the different other transformations.

Example of such data, coming from the Excel source

ID Name PhoneNo
1 John Doe 11110000
1 John Doe 88881111
2 Harry Potter 88999000
2 Harry Potter 00001112
3 abc xyz 77771111

I need to take save the top 1 ID and Name (and there are more columns) in one table and Phone No and ID will be saved in another

user961
  • 453
  • 6
  • 20
  • Hi @user961, Can you provide a sample source and expected output? As the table has a primary key constraint on the Id column, you cannot insert duplicate values into the SQL table. – NiharikaMoola-MT Jun 14 '22 at 08:13
  • 2
    If the ID repeats multiple times then it cannot be a primary key; a Primary Key is, by definition, unique within the dataset. Perhaps your Primary Key candidate should be made up of a couple of columns (though these are difficult to work with), or perhaps your data set requires normalisation into several tables. – Thom A Jun 14 '22 at 08:40
  • @NiharikaMoola-MT - added the sample data for your reference in the post itself. Thanks for your prompt responses. – user961 Jun 14 '22 at 09:18
  • What does "take care of this scenario" mean? Especially since your description does not make sense since "a primary key [...] which is repeating multiple times" cannot happen, as you were already told. Explain exactly what is given, what it the goal & where you are 1st stuck.. Use enough words, sentences & references to parts of examples to clearly & fully say what you mean. [mre]. PS Give a table in block code format or table format. Input tables to code should be initialization code. See the edit help. – philipxy Jun 14 '22 at 10:36

1 Answers1

1

You can use the aggregate transformation to remove duplicate values from the source.

Source:

Add sample excel source with duplicate values in ID and Name columns.

enter image description here

Aggregate transformation:

Under the group by property, add the list of columns in which the duplicate rows are identified.

enter image description here

Under the aggregates property, add the aggregate column. Here, we are getting the first value of the Phone column from the duplicate rows.

Expression: first(Phone)

enter image description here

Aggregate output:

enter image description here

Sink1:

Add the aggregate output to sink1 to pass the ID, and Name columns to 1 table.

enter image description here

Sink2:

Add another sink after aggregate transformation to pass Id and phone to a different table.

enter image description here

NiharikaMoola-MT
  • 4,700
  • 1
  • 3
  • 15
  • This approach will miss the data for the second table. I need to save all values for the second table as well. – user961 Jun 16 '22 at 04:48