4

Is there a way to know if the data in a SQL Server 2008 R2 table has changed since the last time you used it? I would like to know of any type of change -- whether a new record has been inserted or an existing one has been modified or deleted.

I am not interested in what the particular change might have been. I am only interested in a boolean value that indicates whether or not the table data has been changed.

Finally, I want a simple solution that does not involve writing a trigger for each CRUD operation and then have that trigger update some other log table.

I have C# program that is meant to insert a large amount of initial data into some database tables. This is a one off operation that is supposed to happen only once, or rarely ever again if ever, in the life of the application. However, during development and testing, though, we use this program a lot.

Currently, with about the 10 tables that it inserts data into, each having about 21,000 rows per table, the program takes about 45 seconds to run. This isn't really a huge problem as this is a one-off operation that is anyway going to be done internally before shipping the product to the customer.

Still, I would like to minimize this time. So, I want to not insert data into a table if there has been no change in the table data since my program last used it.

My colleague told me that I could use the CHECKSUM_AGG function in T-SQL. My question(s) are:

1) If I compute the CHECKSUM_AGG(Cast(NumericPrimaryKeyIdColumn AS int)), then the checksum only changes if a new row has been added or an existing one deleted, right? If someone has only modified values of other columns of an existing row in the table, that will have no impact on the checksum aggregate of the ID column, right? Or will it?

2) Is there another way I can solve the problem of knowing whether table data has changed since the last time my program used it?

Water Cooler v2
  • 32,724
  • 54
  • 166
  • 336
  • 2
    Can you add a column to indicate when you update a record? And then query against that column? – Elliott Frisch Nov 18 '13 at 01:19
  • That's definitely an option to think about. Thanks. Even though I am not sure if my client will allow me to do that. – Water Cooler v2 Nov 18 '13 at 01:22
  • how bout add a rowversion in your columns? row version automatically increments after an update edit, delete , and add in your DB just make sure to have another column or table that records the last values of your rowversions so you can track the edits/files that has been edited. – Albert Laure Nov 18 '13 at 02:26

6 Answers6

4

This is very close to what I already had in mind and what @user3003007 mentioned.

One way I am thinking of is to take a CHECKSUM(*) or CHECKSUM(Columns, I, Am, Interested, In) for each such table and then do an aggregate checksum on the checksum of each row, like so:

SELECT CHECKSUM_AGG(CAST(CHECKSUM(*) as int)) FROM TableName;

This is still not a reliable method as CHECKSUM does not work on some data types. So, if I have my column of type text or ntext, the CHECKSUM will fail.

Fortunately for me, I do not have such data types in the list of columns I am interested in, so this works for me.

Water Cooler v2
  • 32,724
  • 54
  • 166
  • 336
1

Have you investigated Change Data Capture?

Edmund Schweppe
  • 4,992
  • 1
  • 20
  • 26
  • Thank you. Interesting. Though it looks like a thing that a DBA would do to a whole database. I am working with a client's database that I do not have much leverage on except at the application level. – Water Cooler v2 Nov 18 '13 at 01:24
1

You can use a combination of hashing and checksum_agg. The below will work as long as the the string values do not overflow the HASHBYTES function. It works by converting all of the columns to strings, concatenating those, hashing the concatenated string, turning the hash into an integer, placing all of those values into a temp table, and then running checksum_agg on the temp table. Could easily be adapted to iterate across all real tables

Edit: Combining MD5 and checksum_agg looks like it works at least for somewhat narrow tables:

declare @tablename sysname
set @tablename  = 'MyTableName'

declare @sql varchar(max) 

set @sql = 'select convert(int,HASHBYTES(''MD5'','''''


declare c cursor for
select column_name
from INFORMATION_SCHEMA.COLUMNS
where table_name = @tablename


open c

declare @cname sysname

fetch next from c into @cname

while @@FETCH_STATUS = 0 
begin
    set @sql = @sql + '+ coalesce(convert(varchar,' + @cname + '),'')'
    fetch next from c into @cname
end


close c
deallocate c

set @sql =  @sql + ')) as CheckSumVal
into ##myresults from ' + @tablename

print @sql

exec(@sql)



select CHECKSUM_AGG(CheckSumVal) from ##myresults
drop table ##myresults
  • Thank you. My knowledge of T-SQL is fairly limited and I haven't been keeping up much. So, I don't understand the second option you mention. Also, for the first option -- how do I get an integer expression for all the columns in the table? – Water Cooler v2 Nov 18 '13 at 01:43
0

How do you know that the change was made by you or that the change is relevant to your needs? If you're not going to do it properly (delete & re-insert or merge) then the whole thing sounds futile to me.

In any case, if you spend only an hour researching, implementing and testing your change, you'd have to run it 80 times (and sit and watch it) before you've broken even on your time. So why bother?

LoztInSpace
  • 5,584
  • 1
  • 15
  • 27
0
  • Add Extra column like last_updated default getdate()
  • Add Extra column int type .
  • Declare enum (enable Flag attribute option to perform bit wise operation ).
  • Then you can apply checksum on this column.

No datatype problem .

CRUSADER
  • 5,486
  • 3
  • 28
  • 64
KumarHarsh
  • 5,046
  • 1
  • 18
  • 22
0

An easy way to check this is to use the system DMVs to check the index usage stats, the first index on the table (id 1) is either the heap or the clustered index of the table itself and so can be used for checking when the last update occurred:

SELECT  DB_NAME(database_id) AS [database_name] ,
        OBJECT_NAME([object_id], [database_id]) AS [index_name] ,
        [user_seeks] ,
        [user_scans] ,
        [user_lookups] ,
        [user_updates] ,
        [last_user_seek] ,
        [last_user_scan] ,
        [last_user_lookup] ,
        [last_user_update]
FROM    sys.dm_db_index_usage_stats
WHERE   [index_id] = 1

From this, you can see the last time that the table was updated as well as how many updates there have been (I have left in the seeks and scans etc just in case you're interested).

It's worth taking note that this data does not persist after a reboot, but it's pretty simple to load it into a permanent table every now and then in order to make the data permanent.

steoleary
  • 8,968
  • 2
  • 33
  • 47