0

I must store few million 128-bit hash values to the database and I am struggling with my table design.

Should I store hash in two BIGINT fields or just use BINARY(16)? My typical use case is to find duplicate records based on hash.

I could just use (with two bigints)

select * from hash where hash1=@hash1 and hash2=@hash2

or (with binary(16) column)

select * from hash where hash=@hash

Does it matter performance wise?

I am using MS SQL Server but I don't think it is important here. However, I am accessing MS SQL from C# program and easy read/write of values is preferred but not a deal breaker. Currently I have an array of ulong.

itix
  • 59
  • 11
  • 2
    If a single value has 128 bits, why would you split it across multiple columns? – Gordon Linoff Jun 28 '17 at 20:26
  • Good question. Since my program is written in C# and there 128 bits is stored on ulong array it feels natural to use same in SQL. – itix Jun 28 '17 at 21:06
  • I'm especially curious about how comparisons are handled. On 64-bit architecture, two BIGINTs can be compared in two operations. Would the BINARY(16) be compared in two operations as well, or byte-for-byte? – Timo Mar 13 '19 at 15:10

2 Answers2

1

A string(255), 2 bigints, or a binary(16) all query about the same speed, enough that using any one of the three won't change performance enough to make a difference.

The two big integers are the fastest, but not by much. A test of 10 million records using two tables, one with two bigint and one with binary(16), and performing 1 million random searches was timed. The bigint won out by 3%. Each table performed each query in under 30 microseconds. The cost to covert the binary(16) to 2 bigint values would probably eat some of that that 3%, so stick with the binary(16) as the natural key.

I needed this answer myself as I had a table of string(255) that needed queried. I looked at the 2 bigint and binary(16) approaches for searching. Searching string(255) also performed within 2% of the bigint approach.

Eric Aya
  • 69,473
  • 35
  • 181
  • 253
0

You will likely be doing a full table scan either way, to me it seems to make more sense to use binary or varbinary, and have an index on only one column.

Bryan Newman
  • 621
  • 3
  • 9