11

I have a list of pets:
Pets

And I need to find a correct owner for each of the pet from Owner table

Owners

In order to correctly match each pet to an owner I need to use a special matching table that looks like this:

Matching

So, for pet with PetID=2 I need to find an owner that has a matched based on three fields:

    Pet.Zip = Owner.Zip 
    and Pet.OwnerName = Owner.Name 
    and Pet.Document = Owner.Document

In our example, it will work like this:

 select top 1 OwnerID from owners
         where Zip = 23456 
         and Name = 'Alex' 
         and Document = 'a.csv'

if OwnerID is not found I then need to match based on 2 fields (Not using field with the highest priority)

In our example:

 select top 1 OwnerID from owners where
             Name = 'Alex' 
             and Document = 'a.csv'

Since no record is found I we then need to match on less fields. In our example:

select top 1 OwnerID from owners where Document = 'a.csv'

Now, we found an owner with OwnerID = 6.

Now we need to update pet with ownerID = 6 and then we can process next pet.

The only way that I can do this right now involves a loop or a cursor + dynamic SQL.

Is it possible to achieve this without loops+dynamic sql? Maybe STUFF + Pivot somehow?

sql fiddle: http://sqlfiddle.com/#!18/10982/1/0

Sample data:

create table  temp_builder
(
    PetID int not null,
    Field varchar(30) not null,
    MatchTo varchar(30) not null,
    Priority int not null
)

insert into temp_builder values
(1,'Address', 'Addr',4),
(1,'Zip', 'Zip', 3),
(1,'Country', 'Country', 2),
(1,'OwnerName', 'Name',1),
(2,'Zip', 'Zip',3),
(2,'OwnerName','Name', 2),
(2,'Document', 'Document', 1),
(3,'Country', 'Country', 1)


create table temp_pets
(
    PetID int null,
    Address varchar(100) null,
    Zip int null,
    Country varchar(100) null,
    Document varchar(100) null,
    OwnerName varchar(100) null,
    OwnerID int null,
    Field1 bit null,
    Field2 bit null
)

insert into temp_pets values
(1, '123 5th st', 12345, 'US', 'test.csv', 'John', NULL, NULL, NULL),
(2, '234 6th st', 23456, 'US', 'a.csv', 'Alex', NULL, NULL, NULL),
(3, '345 7th st', 34567, 'US', 'b.csv', 'Mike', NULL, NULL, NULL)

create table temp_owners
(
    OwnerID int null,
    Addr varchar(100) null,
    Zip int null,
    Country varchar(100) null,
    Document varchar(100) null,
    Name varchar(100) null,
    OtherField bit null,
    OtherField2 bit null,
)

insert into temp_owners values
(1, '456 8th st',  45678, 'US', 'c.csv', 'Mike',  NULL, NULL),
(2, '678 9th st',  45678, 'US', 'b.csv', 'John',  NULL, NULL),
(3, '890 10th st', 45678, 'US', 'b.csv', 'Alex',  NULL, NULL),
(4, '901 11th st', 23456, 'US', 'b.csv', 'Alex',  NULL, NULL),
(5, '234 5th st',  12345, 'US', 'b.csv', 'John',  NULL, NULL),
(6, '123 5th st',  45678, 'US', 'a.csv', 'John',  NULL, NULL)

Edit: I'm overwhelmed by a number of great suggestions and responses. I've tested them and many worked well for me. Unfortunately, I can only award bounty to one solution.

user194076
  • 8,787
  • 23
  • 94
  • 154
  • 2
    I don't understand your priority rules. Why does country have a greater precedence than ZIP code? – Tim Biegeleisen Mar 07 '18 at 06:08
  • @TimBiegeleisen, For Pet #1, I need to try to match by Address, Zip, Country, OwnerName. If no match, then by Zip,Country, OwnerName, if no match, then by Country,OwnerName, if no Match then by OwnerName. So, we moving from more specific to less specific. I came up with columns names for this example just for simplification purposes – user194076 Mar 07 '18 at 06:13
  • There has to be dynamic SQL, otherwise, how can one use columns names stored in another column...? If dynamic SQL is not an option, then your question has no answer IMO. – Michał Turczyn Mar 07 '18 at 07:15
  • @MichałTurczyn I'm ok with using dynamic sql but I want to try to be efficient (My tables have >1 million records. I want to avoid loops if possible) – user194076 Mar 07 '18 at 07:41
  • I've edited your question and copied the sample data from your fiddle link to the question's body. I've also added [sql-server] and [tsql] tags for a larger audience. – Zohar Peled Mar 11 '18 at 10:10
  • Can you add a section clarifying the matching rules? I'm a bit confused by what you have posted so far. – MJH Mar 11 '18 at 17:42
  • Could you not replace your matching table with either a number of views or a table which contains column which has queries that can be executed for each of your pets. I feel it would be much simpler – Steve Ford Mar 13 '18 at 12:14
  • How often do you change the matching table? – Erlangga Hasto Handoko Mar 14 '18 at 10:18
  • Does it need to work for this fixed schema *(where the columns in the tables are known)*? Or do you need it to be extensible to schema with different columns? Also, under what situations would you ever join `zip` to any column other than `zip`, or join `Address` to any column other than `Addr`? *(In other words, can you simplify this with pre-existing knowledge that `Pet.ColumnA` is only ever used to compare with `Owner.ColumnZ`?)* Why do you want to avoid DynamicSQL? Any generalised solution is going to have significant performance bottle-necks *(and probably low maintainability)*. – MatBailie Mar 14 '18 at 17:17
  • Oh, and are all the comparison columns the same data-types? – MatBailie Mar 14 '18 at 17:22
  • For matching on `Zip, Name, Document` of `Z, N, D` against the possibilities `Z, -, D` and `-, -, D`; Should both rows be returned? *(No matches on three columns, no matches on the last two columns, two matches on the last one column.)* Or should it return just `Z, -, D` because that's a better match than `-, -, D`? – MatBailie Mar 14 '18 at 17:33

10 Answers10

4

The use of cursor, loops and dynamic SQL can be avoided by treating the fields used for comparison as an entry in a bit set for each pet. A bit set (FieldSetRank column) is calculated for each priority based on a bit entry (FieldRank rank column).

The Pets and Owner tables has to be unpivoted so that the fields and their associated values can be compared. Each of the fields and value that has been matched is assigned to a corresponding FieldRank. A new bit set is then calculated based on the matched values (MatchSetRank). Only records where the matched sets (MatchSetRank) are equal to the desired sets (FieldSetRank) are returned.

The query performs one final ranking to return records with the highest MatchSetRank (records that matched on the highest number of columns while maintaining priority criteria. The following T-SQL demonstrates the concept.

;WITH CTE_Builder
 AS
 (
     SELECT  [PetID]
            ,[Field]
            ,[Priority]
            ,[MatchTo]
            ,POWER(2, [Priority] - 1) AS [FieldRank] -- Define the field ranking as bit set numbered item.
            ,SUM(POWER(2, [Priority] - 1)) OVER (PARTITION BY [PetID] ORDER BY [Priority] ROWS UNBOUNDED PRECEDING) FieldSetRank -- Sum all the bit set IDs to define what constitutes a completed field set ordered by priority.
     FROM   temp_builder
 ),
CTE_PetsUnpivoted
AS
(   -- Unpivot pets table and assign Field Rank and Field Set Rank.
    SELECT   [PetsUnPivot].[PetID]
            ,[PetsUnPivot].[Field]
            ,[Builder].[MatchTo]
            ,[PetsUnPivot].[FieldValue]
            ,[Builder].[Priority]
            ,[Builder].[FieldRank]
            ,[Builder].[FieldSetRank]

    FROM 
       (
            SELECT [PetID], [Address], CAST([Zip] AS VARCHAR(100)) AS [Zip], [Country], [Document], [OwnerName]
            FROM temp_pets
        ) [Pets]
    UNPIVOT
       (FieldValue FOR Field IN 
          ([Address], [Zip], [Country], [Document], [OwnerName])
    ) AS [PetsUnPivot]
    INNER JOIN [CTE_Builder] [Builder] ON [PetsUnPivot].PetID = [Builder].PetID AND [PetsUnPivot].Field = [Builder].Field
),
CTE_Owners
AS
(
    -- Unpivot Owners table and join with unpivoted Pets table on field name and field value.  
    -- Next assign Pets field rank then calculated the field set rank (MatchSetRank) based on actual matches made.
    SELECT   [OwnersUnPivot].[OwnerID]
            ,[Pets].[PetID]
            ,[OwnersUnPivot].[Field]
            ,[Pets].Field AS [PetField]
            ,[Pets].FieldValue as PetFieldValue
            ,[OwnersUnPivot].[FieldValue]
            ,[Pets].[Priority]
            ,[Pets].[FieldRank]
            ,[Pets].[FieldSetRank]
            ,SUM([FieldRank]) OVER (PARTITION BY [Pets].[PetID], [OwnersUnPivot].[OwnerID] ORDER BY [Pets].[Priority] ROWS UNBOUNDED PRECEDING) MatchSetRank
    FROM 
       (
            SELECT [OwnerID], [Addr], CAST([Zip] AS VARCHAR(100)) AS [Zip], [Country], [Document], [Name]
            FROM temp_owners
        ) [Owners]
    UNPIVOT
       (FieldValue FOR Field IN 
          ([Addr], [Zip], [Country], [Document], [Name])
    ) AS [OwnersUnPivot]
    INNER JOIN [CTE_PetsUnpivoted] [Pets] ON [OwnersUnPivot].[Field] = [Pets].[MatchTo] AND [OwnersUnPivot].[FieldValue] = [Pets].[FieldValue]
),
CTE_FinalRanking
AS
(
    SELECT   [PetID]
            ,[OwnerID]
            -- -- Calculate final rank, if multiple matches have the same rank then multiple rows will be returned per pet. 
            -- Change the “RANK()” function to "ROW_NUMBER()" to only return on result per pet.
            ,RANK() OVER (PARTITION BY [PetID] ORDER BY [MatchSetRank] DESC) AS [FinalRank] 
    FROM    CTE_Owners
    WHERE   [FieldSetRank] = [MatchSetRank] -- Only return records where the field sets calculated based on 
                                            -- actual matches is equal to desired field set ranks. This will 
                                            -- eliminate matches where the number of fields that meets the 
                                            -- criteria is the same but does not meet priority requirements. 
)
SELECT   [PetID]
        ,[OwnerID]
FROM    CTE_FinalRanking
WHERE   [FinalRank] = 1
Edmond Quinton
  • 1,709
  • 9
  • 10
  • I like this method; I came up with a similar approach. To get some more speed out of it, you can store the UNPIVOTED owner field data in a temp table and index it on the field name / field value. – Nik Shenoy Mar 13 '18 at 16:58
  • Kind of similar approach to what I've posted below. The key point is that the match columns are actually static. Only the priorities are dynamic. – dybzon Mar 13 '18 at 19:19
  • Thank you for your answer! Looks great but unfortunately static columns won't work for me. Those can change. – user194076 Mar 17 '18 at 02:15
2

It's quite big task to accomplish... Here's how I have done it:

First, you need add a table, which will hold semi-where clauses, i.e. conditions ready-to-use based on temp_builder table. Also, since you have 5 columns, I assumed there can be at most 5 conditions. Here's creation of the table:

CREATE TABLE [dbo].[temp_builder_with_where](
    [petid] [int] NULL,
    [priority1] [bit] NULL,
    [priority2] [bit] NULL,
    [priority3] [bit] NULL,
    [priority4] [bit] NULL,
    [priority5] [bit] NULL,
    [whereClause] [varchar](200) NULL
) 
--it's good to create index, for better performance
create clustered index idx on [temp_builder_with_where]([petid])

insert into temp_builder_with_where
select petid,[priority1],[priority2],[priority3],[priority4],[priority5],
         '[pets].' + CAST(field as varchar(100)) + ' = [owners].' + CAST(matchto as varchar(100)) [whereClause]
from (
select petid, field, matchto, [priority],
        1 Priority1,
        case when [priority] > 1 then 1 else 0 end Priority2,
        case when [priority] > 2 then 1 else 0 end Priority3,
        case when [priority] > 3 then 1 else 0 end Priority4,
        case when [priority] > 4 then 1 else 0 end Priority5       
from temp_builder) [builder]

Now we are going to loop through that table. You told that this table contains 8000 rows, so I have chosen other way: dynamic query will now insert results only for one petid at once.

In order to do that we need table to store our results:

CREATE TABLE [dbo].[TableWithNewId](
    [petid] [int] NULL,
    [ownerid] [int] NULL,
    [priority] [int] NULL
)

Now dynamic SQL is used for insert statements:

declare @query varchar(1000) = ''
declare @i int, @max int
set @i = 1
select @max = MAX(petid) from temp_builder_with_where

while @i <= @max
begin

    set @query = ''

    select @query = @query + whereClause1 + whereClause2 + whereClause3 + whereClause4 + whereClause5 + ' union all ' from (
    select 'insert into [MY_DATABASE].dbo.TableWithNewId  select ' + CAST(petid as varchar(3)) + ' [petid], [owners].ownerid, 1 [priority] from temp_pets [pets], temp_owners [owners] where (' + [where_petid] + [where1] + ')' [whereClause1],
           case when [where2] is null then '' else ' union all select ' + CAST(petid as varchar(3)) + ' [petid], [owners].ownerid, 2 [priority] from temp_pets [pets], temp_owners [owners] where (' + [where_petid] + [where2] + ')' end [whereClause2], 
           case when [where3] is null then '' else ' union all select ' + CAST(petid as varchar(3)) + ' [petid], [owners].ownerid, 3 [priority] from temp_pets [pets], temp_owners [owners] where (' + [where_petid] + [where3] + ')' end [whereClause3], 
           case when [where4] is null then '' else ' union all select ' + CAST(petid as varchar(3)) + ' [petid], [owners].ownerid, 4 [priority] from temp_pets [pets], temp_owners [owners] where (' + [where_petid] + [where4] + ')' end [whereClause4], 
           case when [where5] is null then '' else ' union all select ' + CAST(petid as varchar(3)) + ' [petid], [owners].ownerid, 5 [priority] from temp_pets [pets], temp_owners [owners] where (' + [where_petid] + [where5] + ')' end [whereClause5]
    from (
            select petid, 'petid = ' + CAST(petid as nvarchar(3)) [where_petid],
               (select ' and ' + whereClause from temp_builder_with_where where petid = t.petid and priority1 = 1 for xml path(''),type).value('(.)[1]', 'varchar(500)') [where1],
               (select ' and ' + whereClause from temp_builder_with_where where petid = t.petid and priority2 = 1 for xml path(''),type).value('(.)[1]', 'varchar(500)') [where2],
               (select ' and ' + whereClause from temp_builder_with_where where petid = t.petid and priority3 = 1 for xml path(''),type).value('(.)[1]', 'varchar(500)') [where3],
               (select ' and ' + whereClause from temp_builder_with_where where petid = t.petid and priority4 = 1 for xml path(''),type).value('(.)[1]', 'varchar(500)') [where4],
               (select ' and ' + whereClause from temp_builder_with_where where petid = t.petid and priority5 = 1 for xml path(''),type).value('(.)[1]', 'varchar(500)') [where5]
       from temp_builder_with_where [t]
       where petid = @i
        group by petid
    ) a
    ) a
    --remove last union all
    set @query = left(@query, len(@query) - 10)
    exec (@query)

    set @i = @i + 1

end

Keep in mind that you must to replace [MY_DATABASE] in above code with your database name . From your example data, this will be the result of query select * from TableWithNewId:

PetId|OwnerId|Priority
1    |6      |4
2    |4      |2
2    |4      |3
3    |1      |1
3    |2      |1
3    |3      |1
3    |4      |1
3    |5      |1
3    |6      |1

Based on that result, you can now assign OwnerId to PetId based on lowest priority (well, you didn't say how to handle situation where there's found more than one OwnerId for the same priority).

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
  • This looks fantastic! I just tested it and it seems to work well. Unfortunately, first query returns 8000 matches for me and when I execute 2nd part of the query (generating @query) it takes forever. I waited for 5 minutes and it never finished. But for small datasets it works. Now I need to figure out how to optimize the 2nd part. Let me know if you have any ideas :) – user194076 Mar 08 '18 at 01:30
  • @user194076 I updated my answer, you could give it a try. – Michał Turczyn Mar 14 '18 at 19:24
2

I will say right away to save your time:

  • My solution uses dynamic SQL. Michał Turczyn correctly noted that you can't avoid it while names of compared column are stored in the DB.
  • My solution uses loops. And I strongly believe that you will not solve this problem with the pure SQL query which will work fast enough on the data size you declared (tables have > 1M records). The logic you described implies iterations by its nature - from bigger set of matching fields to the lower set. SQL as a query language was not designed to cover such tricky scenarios. You could try to solve your problem with pure SQL query, but even if you manage to build such query, it will be very tricky, complex and unclear. I'm not a fan of such solutions. That's why I have not even dig into this direction.
  • On the other hand my solution does not require creation of temporary tables which is an advantage.

Given this, my approach is fair straightforward:

  1. There is an outer loop that iterates from the biggest set of matchers (all matching fields) to the smallest set of matchers (one field). On the first iteration, when we don't know yet how many matchers are stored in the database for the pet, we read and use them all. On the next iterations, we decrease number of used matchers by 1 (removing the one with highest priority).

  2. The inner loop iterates over current set of matchers and builds the WHERE clause that compares fields between Pets and Owners table.

  3. Current query is executed and if some owner matches given criteria, we break from the outer loop.

Here is the code that implements this logic:

DECLARE @PetId INT = 2;

DECLARE @MatchersLimit INT;
DECLARE @OwnerID INT;

WHILE (@MatchersLimit IS NULL OR @MatchersLimit > 0) AND @OwnerID IS NULL
BEGIN

    DECLARE @CurrMatchFilter VARCHAR(max) = ''
    DECLARE @Field VARCHAR(30)
    DECLARE @MatchTo VARCHAR(30)
    DECLARE @CurrMatchersNumber INT = 0;

    DECLARE @GetMatchers CURSOR;
    IF @MatchersLimit IS NULL
        SET @GetMatchers = CURSOR FOR SELECT Field, MatchTo FROM temp_builder WHERE PetID = @PetId ORDER BY Priority ASC;
    ELSE
        SET @GetMatchers = CURSOR FOR SELECT TOP (@MatchersLimit) Field, MatchTo FROM temp_builder WHERE PetID = @PetId ORDER BY Priority ASC;

    OPEN @GetMatchers;
    FETCH NEXT FROM @GetMatchers INTO @Field, @MatchTo;
    WHILE @@FETCH_STATUS = 0
    BEGIN
        IF @CurrMatchFilter <> '' SET @CurrMatchFilter = @CurrMatchFilter + ' AND ';
        SET @CurrMatchFilter = @CurrMatchFilter + ('temp_pets.' + @Field + ' = ' + 'temp_owners.' + @MatchTo);
        FETCH NEXT FROM @GetMatchers INTO @field, @matchTo;
        SET @CurrMatchersNumber = @CurrMatchersNumber + 1;
    END
    CLOSE @GetMatchers;
    DEALLOCATE @GetMatchers;

    IF @CurrMatchersNumber = 0 BREAK;

    DECLARE @CurrQuery nvarchar(max) = N'SELECT @id = temp_owners.OwnerID FROM temp_owners INNER JOIN temp_pets ON (' + CAST(@CurrMatchFilter AS NVARCHAR(MAX)) + N') WHERE temp_pets.PetID = ' + CAST(@PetId AS NVARCHAR(MAX));
    EXECUTE sp_executesql @CurrQuery, N'@id int OUTPUT', @id=@OwnerID OUTPUT;

    IF @MatchersLimit IS NULL
        SET @MatchersLimit = @CurrMatchersNumber - 1;
    ELSE
        SET @MatchersLimit = @MatchersLimit - 1;

END

SELECT @OwnerID AS OwnerID, @MatchersLimit + 1 AS Matched;

Performance considerations

There are basically 2 queries that are executed in this approach:

  1. SELECT Field, MatchTo FROM temp_builder WHERE PetID = @PetId;

    You should add an index on PetID field in temp_builder table and this query will be executed very fast.

  2. SELECT @id = temp_owners.OwnerID FROM temp_owners INNER JOIN temp_pets ON (temp_pets.Document = temp_owners.Document AND temp_pets.OwnerName = temp_owners.Name AND temp_pets.Zip = temp_owners.Zip AND ...) WHERE temp_pets.PetID = @PetId;

    This query looks scary because it joins two big tables - temp_owners and temp_pets. However temp_pets table is filtered by PetID column that should result in just one record. So if you have an index on temp_pets.PetID column (and you should as this column seems like a primary key), the query will result into scan of temp_owners table. Such scan will not take the ages even for table with over 1M rows. If the query is still too slow, you could consider adding indexes for columns of temp_owners table that are used in the matchers (Addr, Zip, etc.). Adding indexes has downsides, like bigger database and slower insert/update operations. So before adding the indexes on temp_owners columns, check the query speed on table without indexes.

Community
  • 1
  • 1
CodeFuller
  • 30,317
  • 3
  • 63
  • 79
2

I'm not sure if I've got the end result correctly or not, But I suggest using a couple of common table expressions to generate a batch of update statements using dynamic SQL (I'm afraid it can't be done without dynamic SQL), and then execute them using Exec(sql).

The benefit of this approach is that it involves no loops or cursors.

Every update statement I generate is using an inner join between the pets and the owners tables, updating the pets table's owner id with the owners table owner id, using the mappings from the builder table as the basis to the on clause.
The first cte is responsible of generating the on clause from the builder table, and the second one is responsible to generate the update statements.
Finally, I'm selecting all the SQL statements from the second CTE into a single nvarchar(max) variable and execute it.

The way I've tackled the priority issue is to generate a single update statement for each group of priorities, starting with including all priorities, and excluding values from the next SQL statement, highest priority gets excluded first, until I'm left with an on clause mapping only a single set of columns.

So, the first thing is to declare a variable to hold the generated update statements:

DECLARE @Sql nvarchar(max) = ''

Now, the first CTE is using cross apply with stuff and for xml to generate the on clause for each pair of petId and Priority:

;WITH OnClauseCTE AS
(
SELECT DISTINCT PetId, Priority, OnClause
FROM temp_builder t0
CROSS APPLY
(
    SELECT STUFF (
    (  
        SELECT ' AND p.'+ Field +' = o.'+ MatchTo
        FROM temp_builder t1
        WHERE PetID = t0.PetId
        AND Priority <= t0.Priority
        FOR XML PATH('')  
    )
    , 1, 5, '') As OnClause
) onClauseGenerator
)

The second CTE generates a single UPDATE statement for every petId and Priority combination:

, UpdateStatementCTE AS
(
    SELECT  PetId,
            Priority,
            'UPDATE p 
            SET OwnerID = o.OwnerID 
            FROM temp_pets p 
            INNER JOIN temp_owners o ON ' + OnClause + ' 
            WHERE p.PetId = '+ CAST(PetId as varchar(10)) +'
            AND p.OwnerID IS NULL; -- THIS IS CRITICAL!
            ' AS SQL
    FROM OnClauseCTE
)

And Finally, Generating a single batch of update statements from the UpdateStatementCTE:

SELECT @Sql = @Sql + SQL
FROM UpdateStatementCTE    
ORDER BY PetId, Priority DESC -- ORDER BY Priority is CRITICAL!

The order by PetId is strictly to help the readability, when you print out the content of @Sql. However, the Priority DESC part of the order by clause is critical, since we want to execute the highest priority first and the lowest priority last.

Now, @Sql contains this (shortened):

UPDATE p 
SET OwnerID = o.OwnerID 
FROM temp_pets p 
INNER JOIN temp_owners o ON p.Address = o.Addr AND p.Zip = o.Zip AND p.Country = o.Country AND p.OwnerName = o.Name 
WHERE p.PetId = 1
AND p.OwnerID IS NULL;

...

UPDATE p 
SET OwnerID = o.OwnerID 
FROM temp_pets p 
INNER JOIN temp_owners o ON p.OwnerName = o.Name 
WHERE p.PetId = 1
AND p.OwnerID IS NULL;

...

UPDATE p 
SET OwnerID = o.OwnerID 
FROM temp_pets p 
INNER JOIN temp_owners o ON p.OwnerName = o.Name AND p.Document = o.Document 
WHERE p.PetId = 2
AND p.OwnerID IS NULL;

...

UPDATE p 
SET OwnerID = o.OwnerID 
FROM temp_pets p 
INNER JOIN temp_owners o ON p.Country = o.Country 
WHERE p.PetId = 3
AND p.OwnerID IS NULL;

As you can see, every update statement is represented in the builder table, and will only change the Owner Id if the previous update statement haven't already, because of the AND p.OwnerID IS NULL part of the where clause.

After running the batch of update statements, your temp_pets table looks like this:

PetID   Address         Zip     Country     Document    OwnerName   OwnerID     Field1  Field2
1       123 5th st      12345   US          test.csv    John        5           NULL    NULL
2       234 6th st      23456   US          a.csv       Alex        6           NULL    NULL
3       345 7th st      34567   US          b.csv       Mike        1           NULL    NUL

You can see a live demo on rextester.

However, please note that the less conditions you have the more records might return back from the join, making the update more likely to be inaccurate. For instance, For PetId 3 I've got OwnerId 1 since the only thing I've had to match the records was the Country column, meaning it could actually be every OwnerId in this sample data, since everyone have the same value US in the Country column.
Under the following rules, there is not much I can do about that.

Zohar Peled
  • 79,642
  • 10
  • 69
  • 121
2

This is possible to do without dynamic sql or loops. The key point is that the columns used to match the pets and owners are static. Only the priorities are dynamic. However, the performance depends largely on your data. You'll have to test that yourself and consider what you think is the best approach.

The solution below basically finds all owners that match any given pet. The owners are then filtered to only include the owners that match priority 1, or 1 & 2, or 1 & 2 & 3, etc. Finally the "best" of the matching owners is found, and the pet table is updated with this value.

I've added some explanatory comments to the query, but please feel free to ask if anything is unclear.

-- We start off by converting the priority values into int values that are suitable to add up to a bit array
-- I'll save those in a #Temp table to cut that piece of logic out of the final query
IF EXISTS(SELECT 1 FROM #TempBuilder)
BEGIN
    DROP TABLE #TempBuilder
END
SELECT 
    PetID, Field, MatchTo, 
    CASE [Priority] 
    WHEN 1 THEN 16 -- Priority one goes on the 16-bit (10000)
    WHEN 2 THEN 8 -- Priority two goes on the 8-bit (01000)
    WHEN 3 THEN 4 -- Priority three goes on the 4-bit (00100)
    WHEN 4 THEN 2 -- Priority four goes on the 2-bit (00010)
    WHEN 5 THEN 1 END AS [Priority] -- Priority five goes on the 1-bit (00001)
INTO #TempBuilder
FROM dbo.temp_builder;

-- Then we pivot the match priorities to be able to join them on our pets
WITH PivotedMatchPriorities AS (
    SELECT
        PetId,
        [Address], [Zip], [Country], [OwnerName], [Document]
    FROM (SELECT PetId, Field, [Priority] FROM #TempBuilder) tb
        PIVOT 
        (
            SUM([Priority])
            FOR [Field] IN ([Address], [Zip], [Country], [OwnerName], [Document])
        )
        AS PivotedMatchPriorities
),
-- Next we get (for each pet) all owners with ANY matching value
-- We want to filter the matching owners to find these that match priorities 1 (priority sum 10000, i.e. 16), 
    --- or match priorities 1 & 2 (priority sum 11000, i.e. 24)
    --- or match priorities 1 & 2 & 3 (priority sum 11100, i.e. 28)
    --- etc.
MatchingOwners AS (
    SELECT o.*,
        p.PetID,
        pmp.[Address] AS AddressPrio,
        pmp.Country AS CountryPrio,
        pmp.Zip AS ZipPrio,
        pmp.OwnerName AS OwnerPrio,
        pmp.Document AS DocumentPrio,
        CASE WHEN o.Addr = p.[Address] THEN ISNULL(pmp.[Address],0) ELSE 0 END
        + CASE WHEN o.Zip = p.Zip THEN ISNULL(pmp.Zip,0) ELSE 0 END
        + CASE WHEN o.Country = p.Country THEN ISNULL(pmp.Country,0) ELSE 0 END
        + CASE WHEN o.Document = p.Document THEN ISNULL(pmp.[Document],0) ELSE 0 END
        + CASE WHEN o.[Name] = p.OwnerName THEN ISNULL(pmp.OwnerName,0) ELSE 0 END AS MatchValue -- Calculate a match value for each matching owner
    FROM dbo.temp_pets p
        INNER JOIN dbo.temp_owners o 
            ON p.[Address] = o.Addr
            OR p.Country = o.Country
            OR p.Document = o.Document
            OR p.OwnerName = o.[Name]
            OR p.Zip = o.Zip
        INNER JOIN PivotedMatchPriorities pmp ON pmp.PetId = p.PetId
),
-- Now we can get all owners that match the pet, along with a match value for each owner.
-- We want to rank the matching owners for each pet to allow selecting the best ranked owner
-- Note: In the demo data there are multiple owners that match petId 3 equally well. We'll pick a random one in such cases.
RankedValidMatches AS (
    SELECT 
        PetID,
        OwnerID,
        MatchValue,
        ROW_NUMBER() OVER (PARTITION BY PetID ORDER BY MatchValue DESC) AS OwnerRank
    FROM MatchingOwners
    WHERE MatchValue IN (16, 24, 28, 30, 31)
)
-- Finally we can get the best valid match per pet
--SELECT * FROM RankedValidMatches WHERE OwnerRank = 1
-- Or we can update our pet table to reflect our results
UPDATE dbo.temp_pets
SET OwnerID = rvm.OwnerID
FROM dbo.temp_pets tp
    INNER JOIN RankedValidMatches rvm ON rvm.PetID = tp.PetID AND rvm.OwnerRank = 1
dybzon
  • 1,236
  • 2
  • 15
  • 21
2

The following approach is based on the fact that the number of different combinations to select and order the columns to match on, is limited and likely to be far less than the number of records. With 5 columns, the total number of combinations is 325, but since it is unlikely that every possible combination is being used, the actual number will probably be less than 100. Compared to the number of records (OP mentioned >1M), it pays off to try and combine pets that share the same combination of columns.

Characteristics of the SQL script below:

  • No dynamic SQL.
  • Loops, but no cursors; the number of iterations is limited and does not grow proportionally with the number of records.
  • Creates two (indexed) helper tables. (Please feel free to make them temporary tables or table variables.) This greatly speeds up the matching process (INNER JOIN), but it does bring some overhead in having to populate the tables.
  • Only straightforward SQL constructs (no pivots, no stuffed FOR XML, not even CTEs).
  • Relies only on indexes on key columns (PetID, OwnerID), the Priority column, and the columns in the helper tables. Does not need indexes on address, zip, country, document, name.

At first sight the query seems like total overkill (47 SQL statements executed on the small amount of sample data proposed by OP), but for bigger tables the advantage should become apparent. Worst case time complexity should be O(n log n), which is a lot better than many an alternative. But of course it still needs to prove itself in practice; I haven't tested it yet with large data sets.

Fiddle: http://sqlfiddle.com/#!18/53320/1

-- Adding indexes to OP's tables to optimize the queries that follow.
CREATE INDEX IX_PetID ON temp_builder (PetID)
CREATE INDEX IX_Priority ON temp_builder (Priority)
CREATE INDEX IX_PetID ON temp_pets (PetID)
CREATE INDEX IX_OwnerID ON temp_owners (OwnerID)

-- Helper table for pets. Each column has its own index.
CREATE TABLE PetKey (
    PetID int NOT NULL PRIMARY KEY CLUSTERED,
    KeyNames varchar(200) NOT NULL INDEX IX_KeyNames NONCLUSTERED,
    KeyValues varchar(900) NOT NULL INDEX IX_KeyValues NONCLUSTERED
)

-- Helper table for owners. Each column has its own index.
CREATE TABLE OwnerKey (
    OwnerID int NOT NULL PRIMARY KEY CLUSTERED,
    KeyValues varchar(900) NULL INDEX IX_KeyValues NONCLUSTERED
)

-- For every pet, create a record in table PetKey.
-- (Unless the pet already belongs to someone.)
INSERT INTO PetKey (PetID, KeyNames, KeyValues)
SELECT PetID, '', ''
FROM temp_pets
WHERE OwnerID IS NULL

-- For every owner, create a record in table OwnerKey.
INSERT INTO OwnerKey (OwnerID, KeyValues)
SELECT OwnerID, ''
FROM temp_owners

-- Populate columns KeyNames and KeyValues in table PetKey.
-- Lowest priority (i.e. highest number in column Priority) comes first.
-- We use CHAR(1) as a separator character; anything will do as long as it does not occur in any column values.
-- Example: when a pet has address as prio 1, zip as prio 2, then:
--    KeyNames = 'Zip' + CHAR(1) + 'Address' + CHAR(1)
--    KeyValues = '12345' + CHAR(1) + 'John' + CHAR(1)
-- NULL is replaced by CHAR(2); can be any value as long as it does not match any owner's value.
DECLARE @priority int = 1
WHILE EXISTS (SELECT * FROM temp_builder WHERE Priority = @priority)
BEGIN
    UPDATE pk
    SET KeyNames = b.Field + CHAR(1) + KeyNames,
        KeyValues = ISNULL(CASE b.Field
                               WHEN 'Address' THEN p.Address
                               WHEN 'Zip' THEN CAST(p.Zip AS varchar)
                               WHEN 'Country' THEN p.Country
                               WHEN 'Document' THEN p.Document
                               WHEN 'OwnerName' THEN p.OwnerName
                           END, CHAR(2)) +
                    CHAR(1) + KeyValues
    FROM PetKey pk
    INNER JOIN temp_pets p ON p.PetID = pk.PetID
    INNER JOIN temp_builder b ON b.PetID = pk.PetID
    WHERE b.Priority = @priority

    SET @priority = @priority + 1
END

-- Loop through all distinct key combinations.
DECLARE @maxKeyNames varchar(200), @namesToAdd varchar(200), @index int
SELECT @maxKeyNames = MAX(KeyNames) FROM PetKey
WHILE @maxKeyNames <> '' BEGIN
    -- Populate column KeyValues in table OwnerKey.
    -- The order of the values is determined by the column names listed in @maxKeyNames.
    UPDATE OwnerKey
    SET KeyValues = ''

    SET @namesToAdd = @maxKeyNames
    WHILE @namesToAdd <> '' BEGIN
        SET @index = CHARINDEX(CHAR(1), @namesToAdd)

        UPDATE ok
        SET KeyValues = KeyValues +
                        CASE LEFT(@namesToAdd, @index - 1)
                            WHEN 'Address' THEN o.Addr
                            WHEN 'Zip' THEN CAST(o.Zip AS varchar)
                            WHEN 'Country' THEN o.Country
                            WHEN 'Document' THEN o.Document
                            WHEN 'OwnerName' THEN o.Name
                        END +
                        CHAR(1)
        FROM OwnerKey ok
        INNER JOIN temp_owners o ON o.OwnerID = ok.OwnerID

        SET @namesToAdd = SUBSTRING(@namesToAdd, @index + 1, 200)
    END

    -- Match pets with owners, based on their KeyValues.
    UPDATE p
    SET OwnerID = (SELECT TOP 1 ok.OwnerID FROM OwnerKey ok WHERE ok.KeyValues = pk.KeyValues)
    FROM temp_pets p
    INNER JOIN PetKey pk ON pk.PetID = p.PetID
    WHERE pk.KeyNames = @maxKeyNames

    -- Pets that were successfully matched are removed from PetKey.
    DELETE FROM pk
    FROM PetKey pk
    INNER JOIN temp_pets p ON p.PetID = pk.PetID
    WHERE p.OwnerID IS NOT NULL

    -- For pets with no match, strip off the first (lowest priority) name and value.
    SET @namesToAdd = SUBSTRING(@maxKeyNames, CHARINDEX(CHAR(1), @maxKeyNames) + 1, 200)

    UPDATE pk
    SET KeyNames = @namesToAdd,
        KeyValues = SUBSTRING(KeyValues, CHARINDEX(CHAR(1), KeyValues) + 1, 900)
    FROM PetKey pk
    INNER JOIN temp_pets p ON p.PetID = pk.PetID
    WHERE pk.KeyNames = @maxKeyNames

    -- Next key combination.    
    SELECT @maxKeyNames = MAX(KeyNames) FROM PetKey
END
Ruud Helderman
  • 10,563
  • 1
  • 26
  • 45
1

I have written another version using UNPIVOT, but ranking rows and filtering them in a simpler way

;with
-- r: rules table
r as (select * from temp_builder),
-- o0: owners table with all fields unpivotable (varchar)
o0 as (SELECT [OwnerID], [Addr], CAST([Zip] AS VARCHAR(100)) AS [Zip], [Country], [Document], [Name] FROM temp_owners ),
-- o: owners table unpivoted
o as (
    SELECT * FROM o0 
    UNPIVOT (FieldValue FOR Field IN ([Addr], [Zip], [Country], [Document], [Name])) AS p
),
-- p0: pets table with all fields unpivotable (varchar)
p0 as (SELECT [PetID], [Address], CAST([Zip] AS VARCHAR(100)) AS [Zip], [Country], [Document], [OwnerName] FROM temp_pets),
-- p: petstable unpivoted
p as (
    SELECT * FROM p0
    UNPIVOT (FieldValue FOR Field IN ([Address], [Zip], [Country], [Document], [OwnerName])) AS p
),
-- t: join up all data and keep only matching priority
d as (
    select petid, ownerid, priority 
    from (
        select r.*, o.ownerid, ROW_NUMBER() over (partition by r.petid, o.ownerid order by r.petid, o.ownerid, priority) calc_priority
        from r
        join p on (r.field = p.field) and (p.petid = r.petid)
        join o on (r.matchto = o.field) and (p.fieldvalue=o.fieldvalue) 
    ) x
    where calc_priority=priority
),
-- g: group by the matching rows to know the best priority reached for each pet
g as (
    select petid, max(priority) max_priority
    from d
    group by petid
)
-- output only the rows with best priority
select d.*
from d
join g on d.petid = g.petid and d.priority = g.max_priority
order by petid, ownerid, priority

This version is not more performant than @EdmondQuinton one, (I voted for him), mine is 5% slower but i think it is more simple to understand and maintain for non pro users.

MtwStark
  • 3,866
  • 1
  • 18
  • 32
  • Thanks! Unfortunately, static pivot won't work for me. Fields can change. – user194076 Mar 17 '18 at 02:14
  • do you mean table structure of owners and pets can change? are them "temp" tables built by you during the process? you could name columns Col1..Col10 (up to the max number of columns you need) and leave nulls in unused columns, in this way you will have static column names for UNPIVOT – MtwStark Mar 19 '18 at 09:25
0

I would take a slightly different approach, rather than store the columns to match, you could store the query to be executed:

create table builder
(
    PetID int not null,
    Query varchar(max)
)

INSERT INTO builder
VALUES (1, 'SELECT TOP 1 *
FROM pets
INNER JOIN Owners
    ON Owners.Name = pets.OwnerName 
WHERE petId = 1
ORDER BY 
    CASE WHEN Owners.Country = pets.Country THEN 0 ELSE 1 END,
    CASE WHEN Owners.Zip = pets.Zip THEN 0 ELSE 1 END,
    CASE WHEN Owners.Addr = pets.Address THEN 0 ELSE 1 END'),
(2, 'SELECT TOP 1 *
FROM pets
INNER JOIN Owners
    ON Owners.Name = pets.OwnerName 
WHERE petId = 2
ORDER BY 
    CASE WHEN Owners.Document = pets.Document THEN 0 ELSE 1 END,
    CASE WHEN Owners.Name = pets.OwnerName THEN 0 ELSE 1 END,
    CASE WHEN Owners.Zip = pets.Zip THEN 0 ELSE 1 END'),
(3, 'SELECT TOP 1 *
FROM pets
INNER JOIN Owners
    ON Owners.Name = pets.OwnerName 
WHERE petId = 3
ORDER BY 
    CASE WHEN Owners.Country = pets.Country THEN 0 ELSE 1 END
')

create table pets
(
    PetID int null,
    Address varchar(100) null,
    Zip int null,
    Country varchar(100) null,
    Document varchar(100) null,
    OwnerName varchar(100) null,
    OwnerID int null,
    Field1 bit null,
    Field2 bit null
)

insert into pets values
(1, '123 5th st', 12345, 'US', 'test.csv', 'John', NULL, NULL, NULL),
(2, '234 6th st', 23456, 'US', 'a.csv', 'Alex', NULL, NULL, NULL),
(3, '345 7th st', 34567, 'US', 'b.csv', 'Mike', NULL, NULL, NULL)

create table owners
(
    OwnerID int null,
    Addr varchar(100) null,
    Zip int null,
    Country varchar(100) null,
    Document varchar(100) null,
    Name varchar(100) null,
    OtherField bit null,
    OtherField2 bit null,
)

insert into owners values
(1, '456 8th st',  45678, 'US', 'c.csv', 'Mike',  NULL, NULL),
(2, '678 9th st',  45678, 'US', 'b.csv', 'John',  NULL, NULL),
(3, '890 10th st', 45678, 'US', 'b.csv', 'Alex',  NULL, NULL),
(4, '901 11th st', 23456, 'US', 'b.csv', 'Alex',  NULL, NULL),
(5, '234 5th st',  12345, 'US', 'b.csv', 'John',  NULL, NULL),
(6, '123 5th st',  45678, 'US', 'a.csv', 'John',  NULL, NULL)

Now to find the matching owner for a particular pet you just find the query out of the table and execute it:

DECLARE @query varchar(max)
SELECT TOP 1 @query = query
FROM builder
WHERE petId =1

EXEC (@query)
Steve Ford
  • 7,433
  • 19
  • 40
0

This is an answer that strictly solves your problem considering this

  • Following the rules you proposed no loops, no cursors, no dynamic sql
  • And also considering strictly your problem so this is not a general solution it is very specific for your problem and the columns you have

TestData

declare @Pets table 
(
    PetID int null,
    Address varchar(100) null,
    Zip int null,
    Country varchar(100) null,
    Document varchar(100) null,
    OwnerName varchar(100) null,
    OwnerID int null,
    Field1 bit null,
    Field2 bit null
)

insert into @Pets values
(1, '123 5th st', 12345, 'US', 'test.csv', 'John', NULL, NULL, NULL),
(2, '234 6th st', 23456, 'US', 'a.csv', 'Alex', NULL, NULL, NULL),
(3, '345 7th st', 34567, 'US', 'b.csv', 'Mike', NULL, NULL, NULL)

declare @owners table
(
    OwnerID int null,
    Addr varchar(100) null,
    Zip int null,
    Country varchar(100) null,
    Document varchar(100) null,
    Name varchar(100) null,
    OtherField bit null,
    OtherField2 bit null
)

insert into @owners values
(1, '456 8th st',  45678, 'US', 'c.csv', 'Mike',  NULL, NULL),
(2, '678 9th st',  45678, 'US', 'b.csv', 'John',  NULL, NULL),
(3, '890 10th st', 45678, 'US', 'b.csv', 'Alex',  NULL, NULL),
(4, '901 11th st', 23456, 'US', 'b.csv', 'Alex',  NULL, NULL),
(5, '234 5th st',  12345, 'US', 'b.csv', 'John',  NULL, NULL),
(6, '123 5th st',  45678, 'US', 'a.csv', 'John',  NULL, NULL)

declare @builder table  
(
    PetID int not null,
    Field varchar(30) not null,
    MatchTo varchar(30) not null,
    Priority int not null
)

insert into @builder values
(1,'Address', 'Addr',4),
(1,'Zip', 'Zip', 3),
(1,'Country', 'Country', 2),
(1,'OwnerName', 'Name',1),
(2,'Zip', 'Zip',3),
(2,'OwnerName','Name', 2),
(2,'Document', 'Document', 1),
(3,'Country', 'Country', 1)

Code that solves the problem

select distinct p.PetID, min(o.OwnerID) as ownerID from @pets p
inner join @builder b on p.PetID = b.PetID
inner join @owners o on 
( 
   (case when b.Field = 'Address' and b.Priority = 1 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 1 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 1 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 1 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 1 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 1 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 1 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 1 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 1 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 1 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 2 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 2 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 2 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 2 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 2 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 2 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 2 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 2 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 2 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 2 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 3 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 3 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 3 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 3 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 3 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 3 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 3 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 3 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 3 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 3 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 4 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 4 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 4 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 4 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 4 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 4 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 4 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 4 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 4 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 4 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 5 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 5 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 5 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 5 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 5 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 5 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 5 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 5 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 5 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 5 then o.Document else '-1' end)                    
)
group by p.PetID

union
--------------------------

select distinct p.PetID, min(o.OwnerID) as ownerID from @pets p
inner join @builder b on p.PetID = b.PetID
inner join @owners o on 
( 
   (case when b.Field = 'Address' and b.Priority = 1 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 1 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 1 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 1 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 1 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 1 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 1 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 1 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 1 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 1 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 2 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 2 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 2 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 2 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 2 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 2 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 2 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 2 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 2 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 2 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 3 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 3 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 3 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 3 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 3 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 3 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 3 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 3 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 3 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 3 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 4 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 4 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 4 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 4 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 4 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 4 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 4 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 4 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 4 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 4 then o.Document else '-1' end)                    
)
group by p.PetID

union
--------------------------

select distinct p.PetID, min(o.OwnerID) as ownerID from @pets p
inner join @builder b on p.PetID = b.PetID
inner join @owners o on 
( 
   (case when b.Field = 'Address' and b.Priority = 1 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 1 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 1 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 1 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 1 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 1 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 1 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 1 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 1 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 1 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 2 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 2 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 2 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 2 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 2 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 2 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 2 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 2 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 2 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 2 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 3 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 3 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 3 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 3 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 3 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 3 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 3 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 3 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 3 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 3 then o.Document else '-1' end)                    
)
group by p.PetID

union
------------------------

select distinct p.PetID, min(o.OwnerID) as ownerID from @pets p
inner join @builder b on p.PetID = b.PetID
inner join @owners o on 
( 
   (case when b.Field = 'Address' and b.Priority = 1 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 1 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 1 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 1 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 1 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 1 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 1 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 1 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 1 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 1 then o.Document else '-1' end)                    
)
AND
( 
   (case when b.Field = 'Address' and b.Priority = 2 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 2 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 2 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 2 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 2 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 2 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 2 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 2 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 2 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 2 then o.Document else '-1' end)                    
)
group by p.PetID

union
------------------------

select distinct p.PetID, min(o.OwnerID) as ownerID from @pets p
inner join @builder b on p.PetID = b.PetID
inner join @owners o on 
( 
   (case when b.Field = 'Address' and b.Priority = 1 then p.Address else '0' end) = (case when b.MatchTo = 'Addr' and b.Priority = 1 then o.Addr else '-1' end)                  
or (case when b.Field = 'Zip' and b.Priority = 1 then p.Zip else '0' end) = (case when b.MatchTo = 'Zip' and b.Priority = 1 then o.Zip else '-1' end)                    
or (case when b.Field = 'Country' and b.Priority = 1 then p.Country else '0' end) = (case when b.MatchTo = 'Country' and b.Priority = 1 then o.Country else '-1' end)                    
or (case when b.Field = 'OwnerName' and b.Priority = 1 then p.OwnerName else '0' end) = (case when b.MatchTo = 'Name' and b.Priority = 1 then o.Name else '-1' end)                  
or (case when b.Field = 'Document' and b.Priority = 1 then p.Document else '0' end) = (case when b.MatchTo = 'Document' and b.Priority = 1 then o.Document else '-1' end)                    
)
group by p.PetID

Result

PetID   OwnerID
1       2
2       6
3       1
Victor Hugo Terceros
  • 2,969
  • 3
  • 18
  • 31
0

If you are looking for a simple solution without union, loop or a cursor or dynamic SQL below query works fine.

SQL Fiddle : http://sqlfiddle.com/#!18/10982/41

select PetID ,COALESCE(
 (select  top 1 OwnerID from temp_owners
     where Zip = pets.Zip 
     and Name = pets.OwnerName
     and Document = pets.Document) ,
     (select top 1 OwnerID from temp_owners where
         Name = pets.OwnerName 
         and Document = pets.Document)  ,
         (select top 1 OwnerID from temp_owners where
          Document = pets.Document)  ) OwnerId
       from 
temp_pets pets

Result:

PetID   OwnerId
1       (null)
2       6
3       2
Pelin
  • 936
  • 5
  • 12
  • This doesn't need Dynamic SQL because it's hard-coded and completely ignores the content of the "special matching table"... – MatBailie Mar 16 '18 at 19:40