0

I have 3 tables:

  1. Staging: that gets employee records inserted every month.
  2. Master: table has contains all previously entered records from staging, unique records.
  3. Changes: keeps track of all changes - has no primary key.

The tables have 10 columns. In the staging table, every month we have about 2,500,000 records. Using a cursor I am able to insert new records from staging into the master table.

When it comes to update, I am using an inner join to get the records from staging that already exist in the master table.

To find out if any of the employee info has changed, do I have to query something line this:

WHERE Staging.FirstName <> Master.FirstName
   OR Staging.LastName <> Master.LastName
   OR ...

And so on for 10 columns, or is there an easier way?

CodeCaster
  • 147,647
  • 23
  • 218
  • 272
user2726975
  • 1,285
  • 3
  • 17
  • 26

2 Answers2

1

If the two tables really are identical, you could create a persisted computed column in each table that represents a checksum of the entire row (see http://technet.microsoft.com/en-us/library/ms189788.aspx), create an index on that, and then use that for your joins.

Ken Smith
  • 20,305
  • 15
  • 100
  • 147
  • Well, there is an extra column to check status of record(whether processed or not) in the staging table – user2726975 Oct 16 '13 at 19:34
  • 1
    You could probably just compute the checksum on the specific columns, e.g., `alter table Staging add cs CHECKSUM(FirstName, LastName, ...)` – Ken Smith Oct 16 '13 at 19:36
  • I was reading about this and many say there will be collisions and hence not a good way of detecting changes – user2726975 Oct 16 '13 at 20:13
  • Good point, in which case you might want to use a CLR-based UDF to do something similar. A good old MD5 hash would probably give you all the collision-free hashing you need. – Ken Smith Oct 16 '13 at 23:33
1

Using a Cursor for millions of rows does not sound like fun.

Maybe you should look at EXCEPT/MERGE

WITH NewAndChanged AS (
    SELECT Stage.Id
          ,Stage.Col1
          ,Stage.Col2
    FROM Stage
  EXCEPT
    SELECT Master.Id
          ,Master.Col1
          ,Master.Col2
    FROM Master
)
MERGE Master
USING NewAndChanged
      ON Master.Id = NewAndChanged.Id
WHEN MATCHED
     THEN UPDATE SET
         Col1 = NewAndChanged.Col1
        ,Col2 = NewAndChanged.Col2
WHEN NOT MATCHED
     THEN INSERT (
              Id
             ,Col1
             ,Col2
          )
          VALUES (
              NewAndChanged.Id
             ,NewAndChanged.Col1
             ,NewAndChanged.Col2
          )
adrianm
  • 14,468
  • 5
  • 55
  • 102