0

I have 2 big tables(About 100-150k rows in each).

The structure of these tables is the same. Ids of entities are also the same in each table.

I need a very fast way to compare these tables and answer the following questions:

  1. Which row's fields are different from another table's row?
  2. Which ids exists in first table and doesn't exists in second table?
  3. Which ids exists in second table and doesn't exists in first table?

Thank you!

Edit: I need to do this comparison using C# or maybe stored procedures(and then to select results by c#)

melnynet
  • 69
  • 1
  • 8

3 Answers3

2

If you have two tables Table1 and Table2 and they have the same structure and primary key named ID you can use this SQL:

--Find rows that exist in both Table1 and Table2
SELECT *
FROM Table1
WHERE EXISTS (SELECT 0 FROM Table2 WHERE Table1.ID = Table2.ID)

--Find rows that exist in Table1 but not Table2
SELECT *
FROM Table1
WHERE NOT EXISTS (SELECT 0 FROM Table2 WHERE Table1.ID = Table2.ID)

If you are trying to compare and find rows that differ in one column or another, that is a little trickier. You can write SQL to check each and every column yourself, but it may be simpler to add a temporary CHECKSUM column to both tables and compare those. If the checksums are different then one or more columns are different.

John Wu
  • 50,556
  • 8
  • 44
  • 80
1

SQL Data Compare is a great tool for doing this. Also Microsoft Visual Studio SQL Server Data Tools has a Data Compare function.

Peter Mourfield
  • 1,885
  • 1
  • 19
  • 38
1

I found the following method to perform very well when comparing large data sets.

http://weblogs.sqlteam.com/jeffs/archive/2004/11/10/2737.aspx

Basically UNION ALL of the two data sources then aggregate them and return only rows which don't have an identical matching row in the other table.

With unionCTE As (
    Select 'TableA' As TableName, col1, col2
      From TableA
    Union All
    Select 'TableB', col1, col2
      From TableB)
Select Max(TableName), col1, col2
  From unionCTE
  Group By col1, col2
  Having Count(*) = 1
  Order By col1, col2, TableName;

This will show the results in a single resultset, and if there are any rows that have the same key but different values the rows will be one above the other so that you can easily compare which values have changed between the tables.

This can easily be put into a stored procedure, if you want.

mendosi
  • 2,001
  • 1
  • 12
  • 18