0

SSIS: How do I query for IDs from 1st database and then select data with only those IDs on 2nd database/data warehouse? There are only 12,000 rows I need data on in the 2nd database/data warehouse but the problem is it has 3M+ rows. I need the second SQL query to constrain it to have only the IDs from the 1st query. TIA!

  1. 1st database query: select distinct ID from tablename1
  2. 2nd database/datawarehouse query: select * from tablename2 where ID in (IDs from 1st database query)
Mitch
  • 11
  • 2
  • Probably a duplicate of https://stackoverflow.com/a/63904215/181965 – billinkc Apr 17 '21 at 17:23
  • Thanks, I looked at it but it didn't really answer my question – Mitch Apr 17 '21 at 19:36
  • Your question is how do I use a list of values from one source to filter against another. Your choices are 1. write an IN statement which only works for small sets. 12k rows is not small enough. 2. Load a table with those values on your second database server with those values. 3. Enumerate through your list of Ids from 1 and run a data flow with a where filter against the current id. That'll fire 12k source queries against the source which will work but it's poor form. 4. Pull all 3m rows from 2 into the dataflow and filter via Lookup task. All of which is in that reference question – billinkc Apr 17 '21 at 19:44
  • How big is this result set? select distinct ID from tablename1 – KeithL Apr 19 '21 at 11:01

1 Answers1

0

Here's my solution, it runs in just about 2 minutes now instead of 40 mins without filtering to the id's I need:

  1. Execute SQL Task: Creates a table (table1) that holds all of the distinct ID's I want to use to query the data warehouse/DB #2. Also creates another table (table2) that will act as the destination table within the 1st DB that holds the distinct ID's, and the columns from the data warehouse I needed.

  2. (Data Flow Task) OLE DB SOURCE (selects all from table1 which holds the distinct ID's)

  3. (Data Flow Task) LOOKUP: Used a lookup task set to full cache mode and ignoring unmatched entries. Finds rows in the data warehouse that match the ID from table1 (one to one relationship) and adds the columns I selected.

  4. (Data flow task) OLE DB DESTINATION: Loads matched output into table2 which is within my destination DB

Mitch
  • 11
  • 2