when to opt for caching in the Lookup component in SSIS

Question

In SSIS Look up there are 3 types of Caches.. Full Partial and No cache.

In our solution it has always been used the default one(Full).Is there any particular scenarios,where it can go for Partial cache/No Cache? In our solutions lockup tables are always small(example:-we have been looking at small tables to get the types or to get description).That might be the reason it has been configure in default(Full cache) mode?

Please let me know the valuable suggestions and opinion.

score 3 · Accepted Answer · edited Jul 29 '20 at 18:30

Let's cover the basics.

Full Cache - prior to a data flow actually executing, all lookup components in full cache mode will run their query against their source and cache all of that data locally. Once the data flow begins, the source systems for these transformations could be removed as integration services has all the data for that point in time.

Partial Cache - No data is pre-cached for a partial cache. For each row that flows through the transform, a partial cache will look at its internal cache to see whether the lookup key(s) has already been through. If they have, then a local copy will be used. Otherwise, a singleton query is fired against the referenced system to find the value. That can get quite expensive if you have a non-optimized lookup, are pulling back lots of data, very unique source keys. If a match is found in the remote system, then that data will be cached locally until the package completes or enough new lookups have generated match and the cache is full.

No Cache - Similar to Partial Cache but it will always perform a query against the source system. Even if your entire import set has but a single, unique value for the key(s).

Why would I pick one over the other

I use a Full cache unless I have a specific reason not to.

In a data warehouse, there is a scenario known as a late arriving dimension. You are loading something that should have a value in a reference table but you didn't know the value existed until NOW! The general resolution is to punch that value into the reference table during the load. In a full cache, every row that went to reference that missing value would not find it and then try to insert it which would cause duplicates. A partial/no cache would solve this at the first miss would result in an insert into the reference table and subsequent lookups would find it and add it to the cache.

Another scenario I've had where partial/no cache was needed was when I needed a range query. I had tables with SurrogateKey|BusinessKey|StartDate|StopDate I need to do a lookup for BusinessKey + MyDate sandwiched between StartDate and StopDate. I use the GUI to drag MyDate to StartDate and then in the advanced editor, I modify the existing query to do a BETWEEN StartDate and StopDate (but of course, I wouldn't use BETWEEN)

when to opt for caching in the Lookup component in SSIS

2 Answers2

Why would I pick one over the other

Related

Linked