SQL Server hash algorithms

Question

If my input length is less than the hash output length, are there any hashing algorithms that can guarantee no collisions.

I know by nature that a one way hash can have collisions across multiple inputs due to the lossy nature of the hashing, especially when considering input size is often greater than output size, but does that still apply with smaller input sizes?

I'd look at these posts [link 1](http://stackoverflow.com/questions/4676828/when-generating-a-sha256-512-hash-is-there-a-minimum-safe-amount-of-data-to) [link 2](http://stackoverflow.com/questions/4676828/when-generating-a-sha256-512-hash-is-there-a-minimum-safe-amount-of-data-to). — Xedni, Jan 09 '15 at 21:59
Must have been a copy paste fail. Regardless, it looks like a suitable answer was given :) — Xedni, Jan 12 '15 at 21:48

score 1 · Accepted Answer · answered Jan 09 '15 at 22:04

Use a symmetric block cipher with a randomly chosen static key. Encryption can never produce a duplicate because that would prevent unambiguous decryption.

This scheme will force a certain output length which is a multiple of the cipher block size. If you can make use a variable-length output you can use a stream cipher as well.

score 0 · Answer 2 · answered Jan 09 '15 at 22:09

Your question sounds like you're looking for a perfect hash function. The problem with perfect hash functions is they tend to be tailored towards a specific set of data.

The following assumes you're not trying to hide, secure or encrypt the data...

To think of it another way, the easiest way to "generate" a perfect hash function that will accept your inputs is to map the data you want to store to a table and associate those inputs with a surrogate primary key. You then create a unique constraint for the column (or columns) to ensure the input you're mapping only maps to a single surrogate value.

The surrogate key could be int, bigint or a guid. It all depends on how many rows you're looking to store.

score 0 · Answer 3 · answered Jan 09 '15 at 23:09

If your input lengths are known to be small, such as 32 bits, then you could actually enumerate through all possible inputs and check the resulting hashes for collisions. That's only going to be 4294967296 possible inputs, and shouldn't take to terribly long to enumerate all of them. Essentially you'd be building a rainbow table to test for collisions.

If there is some security relying on this though, one of the issues is if an attacker knows your input lengths are constrained, it makes it easy for them to also perform the same enumeration to create a map/table that will map hashes back to the original values. "attacker" is a pretty terrible term here though because I have no context of how you are using these hashes and whether you are concerned about being able to reverse them.

SQL Server hash algorithms

3 Answers3