5

I am encrypting using PHP before saving the encrypted data to MySQL. I am convinced this is a better way in the long run vs using MySQL's AES_* functions.

My question now is, is there an efficient way to search encrypted data aside from storing a searchable hashed version of the data? E.g., Two columns per data: first_name_encrypted, first_name_hashed.

$hashed_search = myhash('John');    
$q = "SELECT * FROM table WHERE first_name_hashed = '$hashed_search'";

This is what I do now, is there a better way?

Scott Arciszewski
  • 33,610
  • 16
  • 89
  • 206
IMB
  • 15,163
  • 19
  • 82
  • 140
  • How are you going to search encrypted data? What would the search term be? – Quassnoi Apr 15 '16 at 19:16
  • 2
    What's the use case for encryption here? Once data's hashed you can only do exact matches, there's no ability to do a partial match. – tadman Apr 15 '16 at 19:21
  • What you're searching for is a string, and you need to quote that accordingly. Theoretically, that should be throwing you a syntax error. – Funk Forty Niner Apr 15 '16 at 19:22
  • 1
    Yea.. seems like hashing to a second column is probably your best bet. It's not going to be possible to search on encrypted data. Which is kind of the point of encrypting it in the first place... like if you could search on encrypted data, then a brute force would be dead simple, right? – JNevill Apr 15 '16 at 19:25
  • @tadman Personal data which is includes phone, address, birthday, etc. I understand this is exact match. It would be nice if there was a way to do partial match. – IMB Apr 15 '16 at 19:42
  • @Fred-ii- you are right, it's a typo. – IMB Apr 15 '16 at 19:42
  • @Quassnoi Right now, it's pretty much exact match of personal information name, phone, birthday. – IMB Apr 15 '16 at 19:44
  • @JNevill indeed. So is it safe to say that there really is no other way around? – IMB Apr 15 '16 at 19:47
  • @IMB: do I understand right that all matching strings (like names) are encrypted to the same string (and hashed too)? So all people named John get the same value in `first_name_encypted` and `first_name_hashed`? This kinda kills all the point of the encryption. – Quassnoi Apr 15 '16 at 19:48
  • @IMB Once encrypted the data is utterly opaque to MySQL. Unless you have a very compelling reason to encrypt, leave it plain and spend the effort instead on securing and hardening your database server. If it's not secured properly, no amount of encryption will help if you leak the encryption key as well. – tadman Apr 15 '16 at 19:53
  • @Quassnoi Yes `encrypt('data')` and `hash('data')` I need to hash so I can search it. I need to encrypt so I can hide/view it. I consider this data sensitive, it could be used for identify theft in case of database breach. – IMB Apr 15 '16 at 19:53
  • @IMB: All Smiths, Jones etc in your data can be quite reliably identified from frequency analysis alone should it ever leak. How do you hash your data? Are you using salt at least? – Quassnoi Apr 15 '16 at 19:57
  • @Quassnoi Yes, salt, rounds, pretty much best practice hash. – IMB Apr 15 '16 at 19:59
  • @IMB: did you device your own hashing algorithm? – Quassnoi Apr 15 '16 at 20:02
  • @Quassnoi No. I understand your concern about insecure hashes thanks for that but for the purpose of this question let's assume all bases are covered and the only way of data breach is a rogue employee with full database access. – IMB Apr 15 '16 at 20:04
  • @IMB: well the whole point of encryption is to protect yourself from such a scenario. Let's say you have a table with 1M records. Assuming you're an American, it would not be much of a stretch to say that the most common value of `last_name_hashed` would correspond to Smith, and the second one to Jones. You should give each of your encrypted fields a unique value (say encrypt it with nonce), which means you can only search by scanning and decrypting the whole table. – Quassnoi Apr 15 '16 at 20:14
  • 1
    @Quassnoi Nonce is a good idea, thanks. So I guess we can safely it's either whole table decryption at the cost of performance vs non-nonced exact match hash search. – IMB Apr 15 '16 at 20:25
  • @IMB: nonce is used encryption, when it's used with hash it's called salt (and it's visible to everyone). But yes, that's the idea – Quassnoi Apr 15 '16 at 20:28

2 Answers2

3

My question now is, is there an efficient way to search encrypted data aside from storing a searchable hashed version of the data? E.g., Two columns per data: first_name_encrypted, first_name_hashed.

Close, but no cigar. See: How to search encrypted information with a blind index.

One example, using an authenticated encryption library instead of just using MySQL's built-in AES_*() features:

$first_name_hash = hash_hmac('sha256', $firstName, $secretKey);
$stmt = $db->prepare('SELECT * FROM table WHERE first_name_idx = ?');
$result = $db->execute([$first_name_hash])
    ->fetch(PDO::FETCH_ASSOC);
if ($result) {
    $first_name = Crypto::decrypt($result['first_name_encrypted'], $otherSecretKey);
}

A blind index based on HMAC-SHA256 is preferable to a simple hash.

Also: Use authenticated encryption. This is not negotiable.

Scott Arciszewski
  • 33,610
  • 16
  • 89
  • 206
-1

Generally you shouldn't be encrypting data stored in a database that you need to search over.

In your example you give, it would be helpful to know the context of why you are grabbing a user by first name, and what your overall security concerns are...

$hashed_search = myhash('John');    
$q = 'SELECT * FROM table WHERE first_name_hashed = '.$hashed_search;

Is this a web app and your main concern is unencrypted transmission of a user's personal info over the network? Use an encrypted connection when sending data between the user's PC and the server (e.g. 'https').

Is your concern someone hacking the server and downloading a copy of the database? Consider limiting the amount of personally identifying info you are storing. Do you really need to store a user's real name?

Assuming you DO need to store personally identifying information about a user, consider using other methods to fetch their records from the database than using the personally identifying parts (i.e. don't grab them by 'first_name'). Consider grabbing a user by ID or by a Username that can be unrelated to their real names. This will allow you to make use of Indexing for fast retrieval of records and you can encrypt their personal info (first name, last name, email, phone #, etc) to your heart's content.

If this doesn't help you, maybe provide some more context about what you're trying to accomplish and why.

TLDR: Trying to search over encrypted data is a bad idea. Think of what problem you're trying to avoid and come up with an alternate solution.

Joe Bubna
  • 296
  • 1
  • 2
  • 7
  • You can index an encrypted column, encrypt the search value and then search with no further overhead. *However*, for all practical purposes this is only good for exact matches. Searching for names, for example, greater or less than a given value won't work since it's in the encrypted order, not the order of the underlying data. – Mike Apr 15 '16 at 19:56