0

I need to store data hashes in the database, so that later I know that some data is already present in the database. This is done using the standard digest algorithm with salt, so that these hashes are secure in case of any brute-force attacks.

Now I have a problem that the hashes are really secure, but I cannot anymore identify that some data is already present in the database (since hashes are different for the same data)?

How to identify same data (not the content, only that the data is same), even if using salted digest?

bozo
  • 947
  • 1
  • 13
  • 33

1 Answers1

2

Perform the same data + salt hash operation and compare the result with what's stored in the database. If you don't know the salt, you're SOL.

datenwolf
  • 159,371
  • 13
  • 185
  • 298
  • I must have drank too much beer today. Thanks :) – bozo Mar 06 '14 at 15:22
  • @bozo: Ah, yes. One easily misses the [Balmer Peak](https://xkcd.com/323/) if there's no careful calibration data on one own's physiology and alcohol metabolism ;) – datenwolf Mar 06 '14 at 15:40
  • One question: what if you have 10.000 records in the DB (so not a simple username-password 1:1 situation, but you don't know where your data is in the list)? That means you have to iterate 10.000 times (to take the random salt out of DB) and do the hashing that many times to find the match? – bozo Mar 06 '14 at 15:40
  • @bozo: Essentially yes. That's the whole purpose of using a salt: Creating additional workload to make brute force or dictionary attacks hard(er) and thwart the use of rainbow tables altogether. – datenwolf Mar 06 '14 at 15:42
  • OK, got it. Will try to reduce the workload by narrowing the data with some helper field, but now I got it all. Many thanks wolf man. – bozo Mar 06 '14 at 15:43
  • @bozo: Gladly. Oh and of course if a true PBKDF is used, then generating a single password hash may take severalo ms or even hundreds of ms, so if you have 10k PBKDF entries this may take hours; but that's the whole idea behind PBKDFs. – datenwolf Mar 06 '14 at 15:45
  • Yes, PBKDF2WithHmacSHA1 is used. But I have figured out how to narrow the data significantly so it should work out. – bozo Mar 06 '14 at 15:50