4

I am having a flat file with 10 records, and out of it 5 records are duplicate records (unique key column: Customer_Id and source_system). This flat file has to load to a Oracle table and this is the first load.

How can I eliminate the duplicates in it?

Marek Grzenkowicz
  • 17,024
  • 9
  • 81
  • 111
Muthukumar
  • 8,679
  • 17
  • 61
  • 86

4 Answers4

6

There are quite a few ways to handle this:

  1. Load the data to a temp table and then just select the distinct records
  2. Use a sorter or aggregator transformation and then select the distinct option (In the properties tab if I remember correctly)
  3. Use the dynamic lookup which will only insert records into the cache if they have not been inserted already.
  4. Use variable port in expression transformation and by using filter we can delete the duplicate records even if it is relational table or flat files.since dynamic lookup is costly, this way is preferable.
Idos
  • 15,053
  • 14
  • 60
  • 75
nolt2232
  • 2,594
  • 1
  • 22
  • 33
2

You can always use an Aggregator transform and group by all the data you want to keep it distinct for. So if you group by all the columns only those who are distinct will come in the end.

dark sun
  • 21
  • 1
  • 3
1

You can use sorter and check load distinct rows.

Nitin
  • 11
  • 1
0

SRC-->SQ-->SRT-->EXP-->RTR-->TGT

You inout from source is

Col1 Col2
1 A
1 B
2 C
2 D
1 E
1 F
3 G
4 H
5 I
6 J
4 K
3 L

In sorter sort data by col1 and after sorter data looks like this

Col1 Col2
1 A
1 B
1 E
1 F
2 C
2 D
3 G
3 L
4 K
4 H
5 I
6 J

In exp you have two input ports

in_col1
in_col2

create variable ports and output ports in the order like below

v_FLAG= IIF(v_col1=in_col1,2,1)

v_col1=in_col1

out_FLAG=v_FLAG

Then out put of expression data looks like this

Col1 Col2 FLAG
1,A, 1
1, B, 2
1, E, 2
1, F, 2
2, C, 1
2, D, 2
3, G, 1
3, L, 2
4, K,1
4, H, 2
5, I, 1
6, J, 1

In router create two groups one for unique records and another one for duplicate records.

unique=(FLAG=1)

duplicate=(FLAG=2)

connect two groups to two targets.SRC-->SQ-->SRT-->EXP-->RTR-->TGT

You inout from source is

Col1 Col2
1 A
1 B
2 C
2 D
1 E
1 F
3 G
4 H
5 I
6 J
4 K
3 L

In sorter sort data by col1 and after sorter data looks like this

Col1 Col2
1 A
1 B
1 E
1 F
2 C
2 D
3 G
3 L
4 K
4 H
5 I
6 J

In exp you have two input ports

in_col1
in_col2

create variable ports and output ports in the order like below

v_FLAG= IIF(v_col1=in_col1,2,1)

v_col1=in_col1

out_FLAG=v_FLAG

Then out put of expression data looks like this

Col1 Col2 FLAG
1,A, 1
1, B, 2
1, E, 2
1, F, 2
2, C, 1
2, D, 2
3, G, 1
3, L, 2
4, K,1
4, H, 2
5, I, 1
6, J, 1

In router create two groups one for unique records and another one for duplicate records.

unique=(FLAG=1)

duplicate=(FLAG=2)

connect two groups to two targets.

Ian Campbell
  • 23,484
  • 14
  • 36
  • 57
arjoon
  • 9
  • 1