SQL consolidating data best Strategy

Question

I am using SQL Server 2005.

I have 8 databases in the same SQL Server. There is a table containing thousand of Customers in each database (Property).

To make it simple

CustomerID      numeric(18,0)
PropertyID      int
CustomerSurname varchar(100)
CustomerName    varchar(50)
CustomerEmail   varchar(100)

Until now each Property populated its Customers individually. Now there is a need to consolidate the Customers for reporting purposes.

I want to find all the common Customers in all databases

(Criteria= CustomerSurname + CustomerEmail + first letter of Customer Name)

and populate a new table (Consolidation) that contains the PropertyID and the CustomerID of the Property databases for eachcommon Customer.

ConsolidationID numeric(18,0)
PropertyID      int
CustomerID      numeric(18,0)

Imagine: Customers on Property 1

1000 1 Smith Adrian smith@jj.com

Customers on Property 2

9876 2 Smith A smith@jj.com

Consolidation Table

1 1 1000
1 2 9876

So in the Consolidation table we have ID=1 for Smith which in Database1 (property) has local ID 1000 and in Database2 (property) has localID 9876

I am puzzled as to how I can find the common Customers using the criteria between 8 databases.The strategy to achieve it.

If you don't have some unique identifying value (between the databases) that you could use to "consolidate" this data in a single record (a SSN or FiscalCode or other unique key) you are out of luck — Steve, Jul 20 '18 at 20:09
I believe Steve is correct here, we don't have any way of discerning between two customers on different tables with the same name in your current setup. — Edward, Jul 20 '18 at 20:38
I was stupidly put wrong order in the final table. PropertyID is the uniqueID of each database. — PanosPlat, Jul 21 '18 at 16:21

critical_error · Answer 1 · 2018-07-21T05:06:15.313

Consolidating your data is a pretty straight-forward process in this scenario.

Here's an example you can run in SSMS to get you started. Note that I am using TABLE variables instead of separate databases, but the concept remains the same.

Declare the tables ( representing databases ):

DECLARE @database1 TABLE ( CustomerID NUMERIC(18,0), PropertyID INT, CustomerSurname VARCHAR(100), CustomerName VARCHAR(50), CustomerEmail VARCHAR(100) );
DECLARE @database2 TABLE ( CustomerID NUMERIC(18,0), PropertyID INT, CustomerSurname VARCHAR(100), CustomerName VARCHAR(50), CustomerEmail VARCHAR(100) );

Insert the sample data you provided:

INSERT INTO @database1 ( CustomerID, PropertyID, CustomerSurname, CustomerEmail, CustomerName ) 
VALUES ( 1, 1000, 'Smith', 'Adrian', 'smith@jj.com' );

INSERT INTO @database2 ( CustomerID, PropertyID, CustomerSurname, CustomerEmail, CustomerName ) 
VALUES ( 2, 9876, 'Smith', 'A', 'smith@jj.com' );

Sprinkle in some SQL Server magic:

SELECT
    ROW_NUMBER() OVER ( PARTITION BY CustomerSurname, CustomerEmail, FirstInitial ORDER BY CustomerSurname, CustomerEmail, FirstInitial ) AS ConsolidationID
    , Consolidated.CustomerID
    , Consolidated.PropertyID
FROM (

    SELECT CustomerID, PropertyID, CustomerSurname, CustomerName, CustomerEmail, LEFT( CustomerName, 1 ) AS FirstInitial FROM @database1
    UNION
    SELECT CustomerID, PropertyID, CustomerSurname, CustomerName, CustomerEmail, LEFT( CustomerName, 1 ) AS FirstInitial FROM @database2

) AS Consolidated
ORDER BY
    CustomerID, CustomerSurname, CustomerEmail, FirstInitial;

Returns consolidated resultset:

+-----------------+------------+------------+
| ConsolidationID | CustomerID | PropertyID |
+-----------------+------------+------------+
|               1 |          1 |       1000 |
|               1 |          2 |       9876 |
+-----------------+------------+------------+

Putting it to use:

To use this with your eight databases, you would simply replace the table variables ( @database1, @database2, etc... ) with the fully-qualified name to the database and table to be referenced.

SELECT {column-list} FROM MyDatabase1.dbo.TableName...
UNION
SELECT {column-list} FROM MyDatabase2.dbo.TableName...

Etc...

ROW_NUMBER() is the real "magic" here. By using its PARTION BY and ORDER BY we can get a single "ConsolidationID" for each row matching the partition criteria, in this case CustomerSurname, CustomerEmail and FirstInitial. ORDER BY is required to ensure the data is ordered correctly so the partion works as expected.

A few important things to note:

The column names must be exact between all tables and in the same order. You can alias column names if needed.
Using UNION will exclude exact duplicates when all compared columns are the same. I expect this is the behavior you want in this case, but if not, replace UNION with UNION ALL to return all rows, including exact matches.
SQL Server's ROW_NUMBER() is a pretty slick feature. You can read more about it here.

I hope this helps get you on your way.