3

I have been searching for this answer for quite sometime and so far, have not found a solution. I am creating a Data Flow Task in SSIS, and I need to join 2 tables on different data sources using the LIKE operator, or FINDSTRING, or CONTAINS etc.

Here is what my data looks like:

Table1 :: Data Source 1

| PersonName    |      Address       | 
      Josh         LA, California US   
      Ted          SF, California US
      Beth         NYC, New York US

Table2 :: Data Source 2

| StateID  |  StateName  | 
      01      California   
      02      New York

I need to join Table2 on Table1 where Table1.Address LIKE '%' + Table2.StateName '%'. My desired end result is this:

| PersonName    |   StateID  | 
      Josh           01 
      Ted            01
      Beth           02

I am doing this in SSIS, so Table1 and Table2 are gathered using an OLE DB Source component, now I need to find a way to perform a non-exact merge on those two tables.

  • I cannot use a script component

I have tried using a "Lookup Transformation" where Table1 is the input to the Lookup. Then go to -> Advanced -> "Modify the SQL statement", and I have tried 2 things that have not worked:

1)

select * from (select * from [dbo].[Table2]) [refTable]
where ? LIKE '%' + [refTable].[StateName] + '%'

2)

select * from (select * from [dbo].[Table2]) [refTable]
where FINDSTRING( ?, [refTable].[StateName],  1) > 0

Is there any way to achieve what I am looking for in SSIS without using a script component? Maybe there is a way to use a fuzzy lookup? If so, how do I configure that? Is there like an advanced "Merge Join" object?

Thomas07vt
  • 219
  • 1
  • 4
  • 11

2 Answers2

2

You are on the right track with your edited Lookup 1). I think you just need to remove the derived table e.g.

select * from from [dbo].[Table2] where ? LIKE '%' + [StateName] + '%'

Mike Honey
  • 14,523
  • 1
  • 24
  • 40
0

you cannot join two tables in different data-sources using a query in the Lookup- Transformation. The query is used to define the data-source to be cached and looked upon and it is run directly on the defined data source for the Lookup Transformation. For eg if you want to lookup only states with state_id greater then 10 you define the query as

Select * from [dbo].[Table2] where StateID > 10

The Best way to implement this would be to have Table1 as your OLEDB Source and have script task or a derived column task that extracts the Statement from the column Address. Use Regex if you go with a script task. This new derived column would have the state alone.

After this we should have a Look-up Transformation which has based on Table2. The Lookup Condition would be on Statename in Table2 and Derived State column in Table1.

Another Option is to have a Merge-Join transformation which can be used to join data coming from disparate data sources.

rogue-one
  • 11,259
  • 7
  • 53
  • 75