Can we select rows with duplicate entries in Alter Row transformation in Azure Data Factory?

Question

There are two columns:

col_A	col_B
111	2.0
222	1.0
222	2.0
333	1.0

Using alter row transformation i would like to select rows that have repeated entries in col_A, in this example 222 and select the corresponding highest value in col_B i.e. 2.0

The output should look as follows:

col_A	col_B
111	2.0
222	2.0
333	1.0

Rakesh Govindula · Accepted Answer · 2022-09-20T11:19:40.960

1

You can use Aggregate transformation in Data flow.

This is my source data:

enter image description here

In the Aggregate transformation, give col_A for group by to select the distict rows and col_B for aggregate. Use the max(col_B) in this to get the max values.

Group By:

Aggregate:

Result:

enter image description here

Update about Extra Columns:

To Carry the Remaining columns after aggregation transformation, use Join transformation with Join type as inner join(Join of above with source) and give the above two columns as Keys.

enter image description here

After Join, you can remove the extra columns with select transformation.

enter image description here

Result with Extra rows:

enter image description here

edited Sep 20 '22 at 11:19

answered Sep 20 '22 at 09:28

Rakesh Govindula

5,257
1
2
11

Thanks Rakesh, while i am checking your answer, i want to add something here. For illustration purpose i used one column Col_b. The transformation looks good but there are other extra columns that i need to carry over. Let's say they have a prefix abc present in their name. how can i select all those columns too? – Lav Mehta Sep 20 '22 at 09:41
You want to group by a single columnA and want the max in col_B and also want to carry the remaining columns as well in result? – Rakesh Govindula Sep 20 '22 at 10:24
Exactly that is what i want. – Lav Mehta Sep 20 '22 at 10:29
'group by a single columnA and want the max in col_B ', After doing this you can select the remaining column's rows which have max value in col_B. – Rakesh Govindula Sep 20 '22 at 10:57
do i need to add a column pattern in Aggregate settings in the data flow? or do you know how can i select the remaining columns? they all have "col_" prefix in the names. @RakeshGovindula – Lav Mehta Sep 20 '22 at 11:06
Aggregate requires atleast one aggregate transformation in every column pattern, so to get the extra columns you can follow the above approach I have added. – Rakesh Govindula Sep 20 '22 at 11:20

Can we select rows with duplicate entries in Alter Row transformation in Azure Data Factory?

1 Answers1