0

My senior developer and I witnessed LookUp Transformation sending random amount of rows. For example: 150 rows are read from CSV but only 90 rows or sometimes 135 rows are sent to the database.

For test purposes we are only dealing with 150 rows but during deployment, more than 1000 to 10,000 rows are estimated.

We noticed that when settings were changed from no-cache [in cache mode for lookup transform] and Partial Cache to Full Cache, only Full Cache yielded results with full 150 row count transferred to database in comparison with 150 rows sent to sent as input to Lookup Transform. (Results were as expected). In computer B that has higher specs than computer A showing problems, we noticed that computer B produced expected results consistently.

Can anyone advise upon this issue?

Recently we noticed that this issue only occurred with originally generated CSV, however after editing using Excel and re-saving, results were fine.

N_E
  • 743
  • 10
  • 16
  • 2
    I don't understand `sending random amount of rows, unequal to number of rows as its input` Your source data has 150 rows for development but 150000 is estimated? What does that mean? Where do you see this row estimation? I then am having trouble understanding your final paragraph. Could you try to clarify the experience with no/partial cache versus full cache. – billinkc May 06 '15 at 14:09
  • Ok, i have edited the question, I think it should be clear to you now. – N_E May 06 '15 at 15:12
  • Is it not finding anything to match on the lookup? Can you show the dataflow in action through a screenshot? I don't follow what your issue is. – Zane May 06 '15 at 16:00
  • There's likely a variety of things at play here. [Full Cache](http://stackoverflow.com/q/7233766/181965) uses .NET based string matching. Case sensitivity is usually to blame for missed matches. However, since saving to Excel and sourcing from there works, I suspect trailing spaces in the CSV are to blame. Try adding a Derived Column with the expression `RTRIM([MyColumnForLookup])` which replaces the existing. Use that in the full cache mode lookup and see whether you don't the expected 100% match rate – billinkc May 06 '15 at 16:19
  • 1
    A None/Partial cache mode is going to result in queries being sent to the database and then you'll have database collation and matching rules in play. Thus, a case insensitive database/table/column with values of `foo, bar, blee` will match source data of `fOo, BAR, bleE` with a partial/no cache while a FULL cache will not make the same match. If that's the case, my usual approach is to add an additional column, LookupMatch which is an upper/lower cased version of the original column and ensure my Lookup query matches the forced casing. – billinkc May 06 '15 at 16:22
  • 2
    Easiest way to test whether you have either situation going on is to change your NoMatch property to RedirectToNoMatchOutput. Add a second Lookup using none/partial and besides grabbing your key, also grab the match column. Add a derived column output and then drop a data viewer in there to allow you to visually inspect the differences. Need specific answer? We'll need a minimal set of data and table structures to reproduce it. – billinkc May 06 '15 at 16:25

1 Answers1

0

I figured it out , turns out that I horribly misconceived the idea that matching only one column to another related column in LookUp Transformation was sufficient to mark the row as unique. All the rows in that first column had same data.

So after I matched all the columns from input column to their corresponding LookUp Column, I finally got expected results.

I think my partial excuse is, all the random results restricted me trying the the very approach billinkc suggested, yet strangely I am not sure how attaching that dataviewer to Match Output stopped LookUp Transformation to send rows to No-Match Output, which made better sense.

Thanks for all your suggestions in regards to debuggin the problem and my apologies over this silly mistake

N_E
  • 743
  • 10
  • 16