3

I'm loading a pipe delimited flat file into a staging table. During the load process an SSIS script component performs some operations on a row. It may set a flag in one field based on values in another field, add a prefix to certain columns, or apply formatting. For example, if a date is missing, the field is assigned to a default date. (if Row.EndDate_isNull then Row.EndDate = defaultDate)

These scripts become cumbersome when the same transformation needs to be applied to a series of rows. For example, a medical record file can describe each of 9 diagnoses with three fields: Diagnosis01_Date, Diagnosis01_Code, Diagnosis01_System....Diagnosis09_Date, Diagnosis09_Code, Diagnosis09_System.

I want to use a loop to perform the operations over each of the 9 groups of 3 fields, instead of writing the same operation 9 times.

If I were dealing with a collection in VB, for example, I'd write something like this in in the Input0_ProcessInputRow sub:

For i = 1 to 9
   row.("Diagnosis0"+ i + "_Date").Value = diagnosisDate
   row.("Diagnosis0"+ i + "_System").value = "ICD10"
next i 

In the SSIS object model, however, the columns are exposed as properties of Input0Buffer and I can't find a way to use a variable to refer to them. So, how can I create a loop that operates on columns by name in an SSIS script component?

Edit: I found the following sources, especially the first two, to be helpful while I was doing my research on this problem. It seems like there should be a solution using system.reflection, but I just don't know .NET well enough to figure it out.

http://agilebi.com/jwelch/2007/10/21/address-columns-generically-in-a-script-task/

http://agilebi.com/jwelch/2007/06/02/xml-destination-script-component/

http://microsoft-ssis.blogspot.com/2010/12/do-something-for-all-columns-in-your.html

http://toddmcdermid.blogspot.com/2011/05/iterating-over-columns-in-ssis-script.html

http://bidn.com/blogs/MikeDavis/ssis/1800/ssis-for-each-column-in-a-data-flow

https://social.msdn.microsoft.com/Forums/en-US/edbac1df-f05f-40db-820a-e009fae201a4/using-script-destination-object-to-create-and-write-to-new-text-file?forum=sqlintegrationservices&forum=sqlintegrationservices

https://social.msdn.microsoft.com/Forums/en-US/757d11c8-8ad4-4021-a959-1d13c8dfdaa7/how-to-run-a-loop-for-all-columns-in-script-component-input-column-collection-for-each-row?forum=sqlintegrationservices

How can I get the column Value in Script Component in SSIS?

Hadi
  • 36,233
  • 13
  • 65
  • 124
DataWriter
  • 1,010
  • 1
  • 10
  • 25

1 Answers1

3

Simple Workaround

You can store columns name in a List(of string) using loops , and use Row.GetType().GetProperties() to manipulate columns dynamically.

Example:

Note: You have to import System.Reflection , System.Linq and System.Collections.Generic libraries

Dim lstDateColumns as new List(of string)
Dim lstSystemColumns as new List(of string)

For i = 1 to 9
    lstDateColumns.Add("Diagnosis0" & i.toString() & "_Date")
    lstSystemColumns.Add("Diagnosis0" & i.toString() & "_System")
Next


For each  dataColumn as PropertyInfo in Row.GetType().GetProperties()


    If lstDateColumns.Contains(dataColumn.Name) Then

                 dataColumn.SetValue(Row, diagnosisDate, Nothing)

     ElseIf lstSystemColumns.Contains(dataColumn.Name) Then

                dataColumn.SetValue(Row, "ICD10", Nothing)

     End IF
Next

And you can filter over columns names from the lists

    Dim lstDateColumns As New List(Of String)
    Dim lstSystemColumns As New List(Of String)

    For i As Integer = 1 To 9
        lstDateColumns.Add("Diagnosis0" & i.ToString() & "_Date")
        lstSystemColumns.Add("Diagnosis0" & i.ToString() & "_System")
    Next

    For Each dataColumn As PropertyInfo In Row.GetType().GetProperties().Where(Function(x) lstDateColumns.Contains(x.Name))

        dataColumn.SetValue(Row, diagnosisDate, Nothing)

    Next


    For Each dataColumn As PropertyInfo In Row.GetType().GetProperties().Where(Function(x) lstSystemColumns.Contains(x.Name))

        dataColumn.SetValue(Row, "ICD10", Nothing)

    Next

References

Hadi
  • 36,233
  • 13
  • 65
  • 124
  • 2
    I see that this gets me part way there--It will allow me to loop through my fields because they're in the list. Is there a way to avoid iterating over all of the columns in the source? – DataWriter Oct 22 '17 at 21:20
  • Is my worry about the overhead of looping through all columns unjustified? – DataWriter Oct 22 '17 at 21:24
  • 1
    You're right. I think you can filter on column name using linq. Few moments and i'll update – Hadi Oct 22 '17 at 21:27
  • 1
    @DataWriter i updated my answer, remember to imports `System.Linq` library to use `Where` function – Hadi Oct 22 '17 at 21:44
  • This is fantastic. I had to change the .net framework to enable linq, since I'm on SSIS2008, and it worked first try. Thank you! – DataWriter Oct 23 '17 at 15:39
  • 1
    Can I declare the lists as private module-level variables and add the column names in the event Public Overrides Sub PreExecute() so that I only have to assign the values once once for the package execution, instead of doing it for each row when Input0_ProcessInputRow fires? – DataWriter Oct 23 '17 at 16:13
  • Yes,Your approach is better. @DataWriter – Hadi Oct 23 '17 at 16:49
  • And, if I can ask one final question: how can I use dataColumn.GetValue to work with a given column's value inside of the loop? "Select case dataColumn.GetValue(Row)" doesn't work. – DataWriter Oct 23 '17 at 16:54
  • I need to check it. And i will reply. @DataWriter – Hadi Oct 23 '17 at 16:58
  • It looks like it's dataColumn.GetValue(row,Nothing) – DataWriter Oct 23 '17 at 17:10
  • @DataWriter i think it is as you said `dataColumn.GetValue(Row,Nothing)` – Hadi Oct 23 '17 at 17:31