3

I'm creating an SSIS package wherein a user needs to be notified if there is a duplicate productcode for one productlabel. We retrieve the products thru a csv flat file being sent to us in a shared location.

In this example I have Productcode = 1a for both productlabel Physio Ormix and Nixen.

So basically the productcode = 1a was a duplicate because it was also used by the productlabel=Nixen. Hence, notify users for the duplicate ProductCode and Productlabel used. I tried creating an aggregate that performs group by ProductCode and Counts it.

In this link is the image of my dataflow

Can someone give me tips on how to do this?

In this link is the desired output

Hadi
  • 36,233
  • 13
  • 65
  • 124
Chadwick
  • 47
  • 1
  • 5
  • It looks like your conditional split could be the place to send the duplicates off to an email task. You want to split on records with a count > 1 in the file. – Jim Jimson Mar 07 '19 at 11:45
  • Just load the data into a staging (interim) table and run the proper SQL that flags the duplicates from SSIS, capture its findings and process. – Arthur Mar 07 '19 at 14:30

1 Answers1

1

I think you can use a script component and a conditional split to get the duplicates without all this logic:

  1. Inside the data flow task add a Script Component
  2. Add an output column of type DT_BOOL (example name is Flag)
  3. Inside the script component write a similar script:

    using System.Collections.Generic;
    
    public class ScriptMain:  
        UserComponent  
    
    {  
    
        List<string> lstKey = new List<string>;
        List<string> lstKeylabel = new List<string>;
    
        public override void Input0_ProcessInputRow(InputBuffer0 Row)  
        {  
    
            if(!lstKey.Contains(Row.ProductCode){
    
                lstKey.Add(Row.ProductCode);
                lstKeylabel.Add(Row.ProductCode + ";" + Row.ProductLabel);
                Row.Flag = true;
    
            }else if(lstKeylabel.Contains(Row.ProductCode + ";" + Row.ProductLabel)) {
    
                Row.Flag = true;
    
            }else{
    
                Row.Flag = false;
    
            }
    
        }  
    
    }
    
  4. Add a conditional split after the script component with a similar expression:

    [Flag] == true
    
  5. All records that are passed thru the true path are unique, all rows passed in the false path are the duplicates.

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • 1
    Hello @Hadi! Thank you so much for your response! I tried this but this has the same output with what I did earlier it got the duplicate 1a Physio Ormix instead of getting the duplicate productcode which was used by another productlabel - 1A nixen – Chadwick Mar 11 '19 at 03:12
  • 1
    @Chadwick this approach will split the flow into unique values and duplicate. why it is not working? – Hadi Mar 11 '19 at 05:59
  • 1
    Hello @Hadi! Yes sir it did separate the unique values and the duplicate. But in this scenario it doesn't mind having 5 rows with the same values for 1A Physio Ormix. As long as it is using the same productcode for one product label only. So the problem here is, I need to retrieve the record that is using the the same productcode but having a different productlabel like 1A Nixen. – Chadwick Mar 11 '19 at 06:52
  • 1
    @Chadwick it can be solved by adding another list. Check the updated script – Hadi Mar 11 '19 at 07:15
  • 1
    sir @Hadi! What the heck! it worked like magic! Thank you so much for your help! – Chadwick Mar 11 '19 at 07:24