4

I have lookup component a with a lookup table that retusn a varchar(4) column with 3 possible values: "T", "R" or "" (empty string).

I'm using an OLE DB connection for the lookup table, and have tried direct access to the table, as well as specifying a query with an RTRIM() on the column, to get sure that the string is empty and not a "blank string of some length".

If I set the cache mode to "Partial cache" everything works fine (either with direct reading of the table, or using the trimming query), and the empty strings of the input table are correctly matched to the corresponding lookup table row.

However, If I change the cache mode to "Full cache", none of the empty strings are matched at all.

I've checked that the data type, DT_STR, and lenght, 4, is the same in the lookup table and the input table.

Is there something that explains this behaviour? Can it be modified?

NOTE: This is not the documented problem with null values. It's about empty strings.

JotaBe
  • 38,030
  • 8
  • 98
  • 117
  • 1
    I can't reproduce this problem. Could you provide the table definition for the lookup column and what collation you are using? Also, are you trimming the dataflow value that you are looking up as well and is that also varchar(4)? If that has any spaces or invisible characters, the lookup will fail. – Mark Wojciechowicz Nov 03 '14 at 18:25

2 Answers2

3

Somewhere, you have trailing spaces, either in your source or your lookup.

Consider the following source query.

SELECT
    D.SourceColumn
,   D.Description
FROM
(
    VALUES 
        (CAST('T' AS varchar(4)), 'T')
    ,   (CAST('R' AS varchar(4)), 'R')
    ,   (CAST('' AS varchar(4)), 'Empty string')
    ,   (CAST('    ' AS varchar(4)), 'Blanks')
    ,   (NULL, 'NULL')
) D (SourceColumn, Description);

For my lookup, I restricted the above query to just T, R and the Empty String rows.

enter image description here

You can see that for the 5 source rows, T, R and Empty String matched and went to the Match Output path. Where I used a NULL or explicitly used spaces, did not make a match.

If I change my lookup mode from Full Cache to Partial, the NULL continues to not match while the explicit spaces does match.

Wut?

In full cache mode, the Lookup transformation executes the source query and keeps the data locally on the machine SSIS is executing on. This lookup is going to be an exact match using .NET equality rules. In that case, '' will not match ' '.

However, when we change our cache mode to None or Partial, we will no longer be relying on the .NET matching rules and instead, we'll use the source Database's matching rules. In TSQL, '' will match ' '

To make your Full Cache mode work as expected, you will need to apply an RTRIM in your Source and/or Lookup transformation. If you are convinced RTRIM isn't working your source, add a Derived Column Transformation and then apply your RTRIM there but I find it's better to abuse the database instead of SSIS.

Biml

Biml, the Business Intelligence Markup Language, describes the platform for business intelligence. BIDS Helper, is a free add on for Visual Studio/BIDS/SSDT that we're going to use to transform a Biml file below into an SSIS package.

The following biml will generate the

<Biml xmlns="http://schemas.varigence.com/biml.xsd">
    <Connections>
        <OleDbConnection Name="CM_OLE" ConnectionString="Data Source=localhost\dev2012;Initial Catalog=tempdb;Provider=SQLNCLI11.0;Integrated Security=SSPI;" />
    </Connections>
    <Packages>
        <Package ConstraintMode="Linear" Name="so_26719974">
            <Tasks>
                <Dataflow Name="DFT Demo">
                    <Transformations>
                        <OleDbSource 
                            ConnectionName="CM_OLE" 
                            Name="OLESRC Source">
                            <DirectInput>
                            SELECT
                                D.SourceColumn
                            ,   D.Description
                            FROM
                            (
                                VALUES 
                                    (CAST('T' AS varchar(4)), 'T')
                                ,   (CAST('R' AS varchar(4)), 'R')
                                ,   (CAST('' AS varchar(4)), 'Empty string')
                                ,   (CAST('    ' AS varchar(4)), 'Blanks')
                                ,   (NULL, 'NULL')
                            ) D (SourceColumn, Description);
                            </DirectInput>
                        </OleDbSource>
                        <Lookup 
                            Name="LKP POC"
                            OleDbConnectionName="CM_OLE"
                            NoMatchBehavior="RedirectRowsToNoMatchOutput"
                            >
                            <DirectInput>
                            SELECT
                                D.SourceColumn
                            FROM
                            (
                                VALUES 
                                    (CAST('T' AS varchar(4)))
                                ,   (CAST('R' AS varchar(4)))
                                ,   (CAST('' AS varchar(4)))
                            ) D (SourceColumn);
                            </DirectInput>
                            <Inputs>
                                <Column SourceColumn="SourceColumn" TargetColumn="SourceColumn"></Column>
                            </Inputs>
                        </Lookup>
                        <DerivedColumns Name="DER Default catcher" />
                        <DerivedColumns Name="DER NoMatch catcher">
                            <InputPath OutputPathName="LKP POC.NoMatch" />
                        </DerivedColumns>
                    </Transformations>

                </Dataflow>
            </Tasks>
        </Package>
    </Packages>
</Biml>
billinkc
  • 59,250
  • 9
  • 102
  • 159
  • Thanks for your thorough answer, and explanation of how cache modes differ from each other (.NET vs T-SQL). The problem was on the input, I added a derived column transformation applying an `RTRIM` to the affected column and it worked fine. However, I still don't understand why it's happenning: if I run this query on the SSMS: `SELECT Clave, LEN(Clave) FROM ...` I get a length of 0 for all the rows with empty string `Clave`. `Clave` is `VARCHAR(4)` on the table. I use an OLE DB source, which direct table read, and it somewhat adds the trailing spaces. Does this have an explanation? – JotaBe Nov 04 '14 at 09:47
  • I answer myself... If I repeat the query adding `'M' + Clave + 'M'` I can see there is really an space in there, even if it says the 'LEN' is 0. Silly me, if I read the docs for LEN: "Returns the number of characters of the specified string expression, **excluding trailing blanks.**" – JotaBe Nov 04 '14 at 09:50
  • Do you mind if I edit your answer and add the info on this comments in it? – JotaBe Nov 04 '14 at 09:52
1

The issue is that FULL Cache uses a .Net equality comparison and Partial and None use SQL.
I have had a similar issue where all works well with a Partial cache and when I use Full, I get Errors with Row not found, as I'm Failing on No Match. My issue was a lower case string in the source and an UPPER version in the Lookup table, so Full/.Net sees these as different and Partial/SQL are happy to do a Case insensitive join. Output the No Match rows to a csv file if you want to see the rows that are failing.

JoeK
  • 88
  • 3
  • 10