5

Inside a data flow, is it possible to do a lookup, insert the non-matching rows, re-lookup and then continue with the full (original) data set?

I can't see how you get/re-use the inserted rows after they've gone to an OLE DB Destination, nor can I see that a Multicast could wait until a lookup insert branch completes (and then union all them back-in).

If it can't be done inside the data flow, would I have to repeat all my filtering logic in a previous data flow task simply to do the lookup inserts first?

This might be an obvious/already answered question, but I can't find much after searching.

PeterX
  • 2,713
  • 3
  • 32
  • 42
  • I don't understand. You have the Data of the first query, you also have the data you want to insert. Why don't you combine them and use them seperately? Multicast the Insertion data to the OLEDB destination and to a union destination. Send the original data to this union (and multicast them anywhere else you want) and now you have the entire dataset. – Athanasios Kataras Nov 07 '12 at 12:19
  • Wouldn't I need to wait until the insert completes so I can re-run the lookup and then resume the normal path? – PeterX Nov 07 '12 at 23:37

1 Answers1

5

This isnt possible inside a single Data Flow. There are various "solutions" around if you google enough but they overlook the architectural reality that rows travel down a Data Flow in buffers/batches, processed in parallel.

So image you have multiple "new" rows arriving in 2 adjacent buffers. There is no way to ensure that your downstream handling of "new" rows from buffer 1 has been completed before buffer 2 hits your upstream lookup. This will result in multiple "new" rows being inserted to your lookup target table for the same key.

You need to have an upstream Data Flow Task that does all the required lookup inserts. This will be a more efficient solution overall at runtime, as your lookup inserts can use Fast Load and Table Lock, and your downstream Lookup can be Full Cache.

Mike Honey
  • 14,523
  • 1
  • 24
  • 40
  • Thanks. So perhaps the question now needs to focus on "how do I re-use the same logic prior to the lookup" in both the upstream Data Flow and the downstream Data Flow. Perhaps a reusable package that performs the same steps, but I can't see a workable path with that yet. – PeterX Nov 08 '12 at 05:33
  • Yes "reuse the logic" would be nice but realistically - this is SSIS. Perhaps you could move SQL code into Variables, and move Lookups to a preceeding Data Flow Task that builds Lookup Caches, but that's probably about it for reuse. – Mike Honey Nov 08 '12 at 23:58