0

I am wondering why there would be a reason to use the hashbytes algorithm when the data below does not contain any passwords or other sensitive data like this. Is there really a need to use this if you are not dealing with passwords, credit card numbers etc?

   select  distinct
    @reportIdLeiDuplicates,
    wp.CrmPartyId,
    wp.GtId,  
    convert(varchar(20), 
            hashbytes('SHA1', isnull(wp.CrmPartyId, '') + isnull(wp.GtId, '') ),    
            2),  
K1205
  • 85
  • 1
  • 8
  • It depends on what you mean by sensitive data. Under European regulations, unless you need to know my name *(and have asked, and have permission, etc)*, you shouldn't store my name. So, instead "tokens" are stored. These tokens can quite reasonably be hashes. *(Be very careful that **your** definition of "sensitive" may not match the **legal** definition. Search for Client Confidential Information, or Personally Identifiable Information, etc. Then ISO 27001 and then GDPR or whatever legislation applies to you.)* – MatBailie Nov 20 '18 at 17:00
  • @MatBailie The IDs don't look like they are PII though. Something like a social security number could be, but this just looks like a database ID. – TJB Nov 20 '18 at 17:11
  • I'm going to suggest that the author is trying to create a new unique ID of the two. It doesn't look very effective to me because you cannot reverse a hashbyte back to the original value using SQL. It would have been better to convert them to string then a (var)binary value that can be converted back to source. – TJB Nov 20 '18 at 17:13
  • @TJB - There is nothing in the actual question to suggest that it is limited only to the code snippet added below it. Thus... `It depends on what you mean by sensitive data.` and my extrapolation of that. – MatBailie Nov 20 '18 at 17:14
  • Another possible reason... It can be used as a "CHECKSUM" that has a reduced risk of collisions. – Jason A. Long Nov 20 '18 at 18:04
  • @K1205 . . . Obviously for this example, `hashbytes()` is *not* serving a function to obfuscate the data -- because the original ids are in the result set. This looks like an attempt to combine two ids or to create a "check sum" like column (as explained in other comments). – Gordon Linoff Nov 20 '18 at 18:16

0 Answers0