1

I am going to create:

  • a table for storing IDs and unique text values (which are expected to be large)
  • a stored procedure which will have a text value as input parameter (it will check if the value exists in the above table and return the corresponding ID if it exists, or inserted a new record if not and return the new ID as well)

I want to optimize the search of text values using hash value of the text and created index on it. So, during the search I expect a non-clustered index to be used (not the clustered index).

I decided to use the HASHBYTES with SHA2_256 and I am wondering are there any differences/benefits if I am storing the hash value as BINARY(32) or NVARCHAR(16)?

gotqn
  • 42,737
  • 46
  • 157
  • 243
  • How is searching a the hash of a text faster than searching the text? If you want hash-buckets them use memory optimized tables. – paparazzo Mar 30 '15 at 13:08
  • I can create index on hash value, while creating index on `NVARCHAR(MAX)` is going to give me warning. The index will be ordered and it should be faster to compare `binary(32)` values then `nvarchar(max)` values. Also, creating index using the hash and including the `ID` (in my case) will let me work with the index itself (while searching) - otherwise, I need to use the clustered index which means more IO operations. – gotqn Mar 30 '15 at 13:18

1 Answers1

3

You can't reasonably store a hash value as chars because binary data is not text. Various text processing and comparison functions interpret those chars. For example trailing whitespace is sometimes ignored leading to incorrect results.

Since you've got 32 totally random unstructured bytes to store a binary(32) is the most natural format and it is the fastest one.

usr
  • 168,620
  • 35
  • 240
  • 369