Consolidating your data is a pretty straight-forward process in this scenario.
Here's an example you can run in SSMS to get you started. Note that I am using TABLE variables instead of separate databases, but the concept remains the same.
Declare the tables ( representing databases ):
DECLARE @database1 TABLE ( CustomerID NUMERIC(18,0), PropertyID INT, CustomerSurname VARCHAR(100), CustomerName VARCHAR(50), CustomerEmail VARCHAR(100) );
DECLARE @database2 TABLE ( CustomerID NUMERIC(18,0), PropertyID INT, CustomerSurname VARCHAR(100), CustomerName VARCHAR(50), CustomerEmail VARCHAR(100) );
Insert the sample data you provided:
INSERT INTO @database1 ( CustomerID, PropertyID, CustomerSurname, CustomerEmail, CustomerName )
VALUES ( 1, 1000, 'Smith', 'Adrian', 'smith@jj.com' );
INSERT INTO @database2 ( CustomerID, PropertyID, CustomerSurname, CustomerEmail, CustomerName )
VALUES ( 2, 9876, 'Smith', 'A', 'smith@jj.com' );
Sprinkle in some SQL Server magic:
SELECT
ROW_NUMBER() OVER ( PARTITION BY CustomerSurname, CustomerEmail, FirstInitial ORDER BY CustomerSurname, CustomerEmail, FirstInitial ) AS ConsolidationID
, Consolidated.CustomerID
, Consolidated.PropertyID
FROM (
SELECT CustomerID, PropertyID, CustomerSurname, CustomerName, CustomerEmail, LEFT( CustomerName, 1 ) AS FirstInitial FROM @database1
UNION
SELECT CustomerID, PropertyID, CustomerSurname, CustomerName, CustomerEmail, LEFT( CustomerName, 1 ) AS FirstInitial FROM @database2
) AS Consolidated
ORDER BY
CustomerID, CustomerSurname, CustomerEmail, FirstInitial;
Returns consolidated resultset:
+-----------------+------------+------------+
| ConsolidationID | CustomerID | PropertyID |
+-----------------+------------+------------+
| 1 | 1 | 1000 |
| 1 | 2 | 9876 |
+-----------------+------------+------------+
Putting it to use:
To use this with your eight databases, you would simply replace the table variables ( @database1, @database2, etc... ) with the fully-qualified name to the database and table to be referenced.
SELECT {column-list} FROM MyDatabase1.dbo.TableName...
UNION
SELECT {column-list} FROM MyDatabase2.dbo.TableName...
Etc...
ROW_NUMBER() is the real "magic" here. By using its PARTION BY and ORDER BY we can get a single "ConsolidationID" for each row matching the partition criteria, in this case CustomerSurname, CustomerEmail and FirstInitial. ORDER BY is required to ensure the data is ordered correctly so the partion works as expected.
A few important things to note:
- The column names must be exact between all tables and in the same order. You can
alias column names if needed.
- Using UNION will exclude exact duplicates when all compared columns
are the same. I expect this is the behavior you want in this case,
but if not, replace UNION with UNION ALL to return all rows,
including exact matches.
- SQL Server's ROW_NUMBER() is a pretty slick feature. You can read
more about it here.
I hope this helps get you on your way.