28

I'm trying to compute a checksum or a hash for an entire table in SQL Server 2008. The problem I'm running into is that the table contains an XML column datatype, which cannot be used by checksum and has to be converted to nvarchar first. So I need to break it down into two problems:

  1. calculate a checksum for a row, schema is unknown before runtime.
  2. calculate the checksum for all of the rows to get the full table checksum.
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Gabe Brown
  • 1,418
  • 3
  • 17
  • 22

3 Answers3

25

You can use CHECKSUM_AGG. It only takes a single argument, so you could do CHECKSUM_AGG(CHECKSUM(*)) - but this doesn't work for your XML datatype, so you'll have to resort to dynamic SQL.

You could generate dynamically the column list from INFORMATION_SCHEMA.COLUMNS and then insert int into a template:

DECLARE @schema_name NVARCHAR(MAX) = 'mySchemaName';
DECLARE @table_name NVARCHAR(MAX) = 'myTableName';
DECLARE @column_list NVARCHAR(MAX);

SELECT @column_list = COALESCE(@column_list + ', ', '')
        + /* Put your casting here from XML, text, etc columns */ QUOTENAME(COLUMN_NAME)
FROM    INFORMATION_SCHEMA.COLUMNS
WHERE   TABLE_NAME = @table_name
    AND TABLE_SCHEMA = @schema_name

DECLARE @template AS varchar(MAX)
SET @template = 'SELECT CHECKSUM_AGG(CHECKSUM({@column_list})) FROM {@schema_name}.{@table_name}'

DECLARE @sql AS varchar(MAX)
SET @sql = REPLACE(REPLACE(REPLACE(@template,
    '{@column_list}', @column_list),
    '{@schema_name}', @schema_name),
    '{@table_name}', @table_name)

EXEC ( @sql )
e_i_pi
  • 4,590
  • 4
  • 27
  • 45
Cade Roux
  • 88,164
  • 40
  • 182
  • 265
  • Thanks! I had to tweak it a bit to special case the datatypes, but I was able to come up with something that was fast and based on this solution. Excellent! – Gabe Brown Oct 20 '09 at 18:36
  • Just be careful with identity columns, also I would use BINARY_CHECKSUM as this is case sensitive. –  Dec 16 '14 at 09:21
  • REPLACE requires 3 arguments. Did you mean: ```sql SET @sql = REPLACE( REPLACE( REPLACE( @template, '{@column_list}', @column_list ), '{@schema_name}', @schema_name ), '{@table_name}', @table_name); ``` – Paul-Sebastian Manole Oct 10 '19 at 14:58
  • Why does this produce a different checksum on different systems with the same table? I even copied the table over to one system as a different name using select into with Linked Server, and on the same machine the checksums are the same but not with the other machine, even though the table is identical. – Paul-Sebastian Manole Oct 11 '19 at 07:45
  • @Paul-SebastianManole yes, there should be a `@template` on the end of the first line of replaces As far as the different checksum, do you have an identity column or something like that? – Cade Roux Oct 11 '19 at 15:54
  • Yes but the tables have the same data and keys (manual keys). – Paul-Sebastian Manole Oct 11 '19 at 16:06
  • Just to make it clearer. Same table duplicated on the same machine produces the same checksums. The same table duplicated on another machine produced the same checksums albeit different checksums from the ones on the first machine. Logically, the checksums between the two machines are therefore different. Odd thing is that the tables look identical on first analysis. They were also copied with select into both times (local duplicate and remote duplicates). – Paul-Sebastian Manole Oct 11 '19 at 16:10
  • @Paul-SebastianManole Not sure what could cause that, it's deterministic, certainly no random element, but perhaps something possible with collation? Any way to make a minimally reproducible case? – Cade Roux Oct 15 '19 at 20:16
  • Tested with restored backup in different machine as well. Produced different hashes. So it's not collation. Might be a system dependant hash seed... – Paul-Sebastian Manole Oct 15 '19 at 20:33
  • It's very interesting. I just tested this dbfiddle locally and on all versions at dbfiddle and they matched. I suggest trying to make a minimal reproducible case https://dbfiddle.uk/?rdbms=sqlserver_2019l&fiddle=cb55ed92ddbaeba8f4317bb1c8dec582 – Cade Roux Oct 15 '19 at 21:27
  • Note that in my specific example, there is no column order - column order will matter in CHECKSUM, but row order will not matter in CHECKSUM_AGG. – Cade Roux Oct 15 '19 at 21:28
3

I modified the script to generate a query for all relevant tables in a database.

USE myDatabase
GO
DECLARE @table_name sysname
DECLARE @schema_name sysname
SET @schema_name = 'dbo'

DECLARE myCursor cursor
FOR SELECT TABLE_NAME
      FROM INFORMATION_SCHEMA.TABLES T
     WHERE T.TABLE_SCHEMA = @schema_name
       AND T.TABLE_TYPE = 'BASE TABLE'
       AND T.TABLE_NAME NOT LIKE 'MSmerge%'
       AND T.TABLE_NAME NOT LIKE 'sysmerge%'
       AND T.TABLE_NAME NOT LIKE 'tmp%'
     ORDER BY T.TABLE_NAME

OPEN myCursor

FETCH NEXT 
FROM myCursor
INTO @table_name 

WHILE @@FETCH_STATUS = 0
BEGIN
    DECLARE @column_list nvarchar(MAX)
    SET @column_list=''
SELECT @column_list = @column_list + CASE WHEN DATA_TYPE IN ('xml','text','ntext','image sql_variant') THEN 'CONVERT(nvarchar(MAX),'
                                          ELSE ''
                                     END
                                   + QUOTENAME(COLUMN_NAME)
                                   + CASE WHEN DATA_TYPE IN ('xml','text','ntext','image sql_variant') THEN ' /* ' + DATA_TYPE + ' */)'
                                          ELSE ''
                                     END + ', '
  FROM INFORMATION_SCHEMA.COLUMNS
     WHERE TABLE_NAME = @Table_name
     ORDER BY ORDINAL_POSITION

    SET @column_list = LEFT(@column_list, LEN(@column_list)-1) -- remove trailing comma

    DECLARE @sql AS nvarchar(MAX)
    SET @sql = 'SELECT ''' + QUOTENAME(@schema_name) + '.' + QUOTENAME(@table_name) + ''' table_name,
       CHECKSUM_AGG(CHECKSUM(' + @column_list + ')) CHECKSUM
  FROM ' + QUOTENAME(@schema_name) + '.' + QUOTENAME(@Table_name) + ' WITH (NOLOCK)'


    PRINT  @sql

    FETCH NEXT 
    FROM myCursor
    INTO @table_name 

    IF @@FETCH_STATUS = 0
        PRINT  'UNION ALL'

END

CLOSE myCursor
DEALLOCATE myCursor
GO
-2

//Quick hash sum of SQL and C # mirror Ukraine //HASH_ZKCRC64 ///-------------------------------------------------------------------------------------------------------------- private Int64 HASH_ZKCRC64(byte[] Data) { Int64 Result = 0x5555555555555555; if (Data == null || Data.Length <= 0) return 0; int SizeGlobalBufer = 8000; int Ost = Data.Length % SizeGlobalBufer; int LeftLimit = (Data.Length / SizeGlobalBufer) * SizeGlobalBufer;

        for (int i = 0; i < LeftLimit; i += 64)
        {
            Result = Result
            ^ BitConverter.ToInt64(Data, i)
            ^ BitConverter.ToInt64(Data, i + 8)
            ^ BitConverter.ToInt64(Data, i + 16)
            ^ BitConverter.ToInt64(Data, i + 24)
            ^ BitConverter.ToInt64(Data, i + 32)
            ^ BitConverter.ToInt64(Data, i + 40)
            ^ BitConverter.ToInt64(Data, i + 48)
            ^ BitConverter.ToInt64(Data, i + 56);
             if ((Result & 0x0000000000000080) != 0)
             Result = Result ^ BitConverter.ToInt64(Data, i + 28); 
        }

        if (Ost > 0)
        {
           byte[] Bufer = new byte[SizeGlobalBufer];
           Array.Copy(Data, LeftLimit, Bufer, 0, Ost);
           for (int i = 0; i < SizeGlobalBufer; i += 64)
           {
               Result = Result
               ^ BitConverter.ToInt64(Bufer, i)
               ^ BitConverter.ToInt64(Bufer, i + 8)
               ^ BitConverter.ToInt64(Bufer, i + 16)
               ^ BitConverter.ToInt64(Bufer, i + 24)
               ^ BitConverter.ToInt64(Bufer, i + 32)
               ^ BitConverter.ToInt64(Bufer, i + 40)
               ^ BitConverter.ToInt64(Bufer, i + 48)
               ^ BitConverter.ToInt64(Bufer, i + 56);
               if ((Result & 0x0000000000000080)!=0)
               Result = Result ^ BitConverter.ToInt64(Bufer, i + 28); 
           }
        }

        byte[] MiniBufer = BitConverter.GetBytes(Result);
        Array.Reverse(MiniBufer);
        return BitConverter.ToInt64(MiniBufer, 0);

        #region SQL_FUNCTION
        /*  CREATE FUNCTION [dbo].[HASH_ZKCRC64] (@data as varbinary(MAX)) Returns bigint
            AS
            BEGIN
            Declare @I64 as bigint Set @I64=0x5555555555555555
            Declare @Bufer as binary(8000)
            Declare @i as int Set @i=1
            Declare @j as int 
            Declare @Len as int Set @Len=Len(@data)     

            if ((@data is null) Or (@Len<=0)) Return 0

              While @i<=@Len
              Begin
               Set @Bufer=Substring(@data,@i,8000)
               Set @j=1
                   While @j<=8000
                   Begin
                    Set @I64=@I64 
                    ^ CAST(Substring(@Bufer,@j,   8) as bigint) 
                    ^ CAST(Substring(@Bufer,@j+8, 8) as bigint) 
                    ^ CAST(Substring(@Bufer,@j+16,8) as bigint)
                    ^ CAST(Substring(@Bufer,@j+24,8) as bigint)
                    ^ CAST(Substring(@Bufer,@j+32,8) as bigint)
                    ^ CAST(Substring(@Bufer,@j+40,8) as bigint)
                    ^ CAST(Substring(@Bufer,@j+48,8) as bigint)
                    ^ CAST(Substring(@Bufer,@j+56,8) as bigint)
                    if @I64<0 Set @I64=@I64 ^ CAST(Substring(@Bufer,@j+28,8) as bigint)      
                    Set @j=@j+64    
                   End;  
               Set @i=@i+8000
              End
            Return @I64
            END
         */
        #endregion

   }