0

I'm attempting to quickly compare large data in two separate applications (one on MySQL, the other on SQL SERVER).

the myData field is a TEXT field in both databases, and I want to see if the value of this field has changed between them (myData can be hundreds of thousands or millions of characters long)

in MySQL:

select sha1(myData) from myTable where mypk=1;

in SQL SERVER:

select right(convert([varchar](45), hashbytes('SHA1', cast(myData as nvarchar(max))), 1),40) from myTable where mypk=1;

My hash output will not match for the same TEXT field value in both DBs. How can I get it to do so?

Here are my assumptions to date:

  1. MySQL will hash TEXT, but SQL-SERVER will not (hence the cast to nvarchar).
  2. MySQL will not allow casting to nvarchar.
  3. If the input types to the hash are different, the output will be different (in my case where one is TEXT and the other is nvarchar).

Additionally, what if instead of SQL Server it was DB2 or Oracle? Is there some simple approach to this (sorry if this part of the q is too vague)?

NEW2WEB
  • 503
  • 2
  • 8
  • 22
  • I'm also not opposed to switching the algorithm (SHA1) to anything else - I just want to ensure my output is short compare to the original TEXT field – NEW2WEB Aug 12 '21 at 15:31
  • At a minimum you should always hash binary values to ensure that differences in text encoding don't change the hash. I'm not sure if you'll get identical hashes from different platforms even so. You might need to just compare length+prefix, and if those match perform a full comparison on the client. – David Browne - Microsoft Aug 12 '21 at 15:34
  • Thanks for the tip @DavidBrowne-Microsoft. I've compared the SHA1 values on both for a fixed string (ie. SHA1("My Sample String"), etc) and found them to produce the same output - so I hoped that the the issue for me just lies with the different data types – NEW2WEB Aug 12 '21 at 15:40
  • It may, especially if the string is all ASCII characters. You might try `varchar(max)` instead of `nvarchar(max)` as your column is `text` not `ntext`. Also you shouldn't be using `text` in SQL Server any more. – David Browne - Microsoft Aug 12 '21 at 15:43
  • the problem with that is that varchar(max) = 8000 chars. Not nearly enough for my text. – NEW2WEB Aug 12 '21 at 15:47
  • 1
    Nope. `max` means `varchar(max)` has a maximum length of 2^31-1 bytes. https://learn.microsoft.com/en-us/sql/t-sql/data-types/char-and-varchar-transact-sql?view=sql-server-ver15 – David Browne - Microsoft Aug 12 '21 at 15:53
  • @DavidBrowne-Microsoft Thanks for your help that worked. I was able to cast it as a varchar(max) and it gave me the result I was looking for! Much Appreciated. If you write the answer in I'll mark it as the correct response – NEW2WEB Aug 12 '21 at 18:19

1 Answers1

1

SQL Server text is a deprecated type that has been replaced by varchar(max) for storing long non-unicode strings encoded. So 'varchar(max)' is more likely to be binary-compatible with MySql text than `nvarchar(max)' which uses two bytes per character.

David Browne - Microsoft
  • 80,331
  • 6
  • 39
  • 67