2

I need to build an identity service that uses a customer supplied key to encrypt sensitive ID values for storage in RDS but also has to allow us to look up a record later using the plaintext ID. We'd like to use a simple deterministic encryption algorithm for this but it looks like KMS API doesn't allow you to specify the IV so you can never get identical plaintext to encrypt to the same value twice.

We also have the requirement to look up the data using another non-secure value and retrieve the encrypted secure value and decrypt it - so one-way hashing is unfortunately not going to work.

Taken together, this means we won't be able to perform our lookup of the secure ID without brute force iterating through all records and decrypting them and comparing to the plaintext value, instead of simply encrypting the plaintext search value using a known IV and using that encrypted value as an index to look up the matching record in the database.

I'm guessing this is a pretty common requirement for things like SSN's and such so how do people solve for it?

Thanks in advance.

Mike B
  • 21
  • 4

3 Answers3

0

look up a record later using the plaintext ID

Then you are loosing quite a bit of security. Maybe you could store a hash (e. g. sha-256) of the ID along the encrypted data, which would make easier to lookup the record, but not revert the value

This approach assumes that the ID is from a reasonably large message space (there are potentially a lot of IDs) so it is not feasible to create a map for every possible value

KMS API doesn't allow you to specify the IV so you can never get identical plaintext to encrypt to the same value twice.

yes, KMS seems to provide its own IV for ciphertext enforcing good security practice

gusto2
  • 11,210
  • 2
  • 17
  • 36
  • Essentially this is a data transformation pipeline where we have input files with sensitive ID's in plaintext that we want to tokenize/replace and then perform a number of different processing steps using the tokenized ID (so that logging and datasets sent between the steps do not have the sensitive ID's at all) and then on an output step re-insert the sensitive ID's, if needed. Hashing in conjunction with encrypting would be a solution but I don't even see a way to do a generic SHA hash on data using an KMS key. The functions seem quite limited overall. – Mike B Feb 24 '20 at 22:06
  • @MikeB you can create a normal cryptographic hash (no KMS is not needed). KMS could be used to encrypt the stored original IDs until the rest of data is processed. – gusto2 Feb 25 '20 at 08:26
  • Yep - thanks - that's exactly what we ended up doing, using pgencrypt extension to make it easy – Mike B Feb 25 '20 at 17:07
0

if I understand your use case correctly, your flow is like this:

  1. The customer provides a key K and you use this key to encrypt a secret S, which is stored in RDS with an associated ID.
  2. Given a non-secret key K, you want to be able to look up S and decrypt it.

If the customer is reusing the key, this is actually not all that hard to accomplish.

  1. Create a KMS key for the customer.

  2. Use this KMS key to encrypt the customer's IV and the key the customer has specified, and store them in Amazon Secrets Manager - preferably namespaced in some way by customer. A Json structure like this:

    { "iv": "somerandomivvalue", "key": "somerandomkey" }

    would allow you to easily parse the values out. ASM also allows you to seamlessly perform key rotation - which is really nifty.

  3. If you're paranoid, you could take a cryptographic hash of the customer name (or whatever) and namespace by that.

  4. RDS now stores the numeric ID of the customer, the insecure values, and a namespace value (or some method of deriving the location) in ASM.

  5. It goes without saying that you need to limit access to the secrets manager vault.

To employ the solution:

  1. Customer issues request to read secure value.
  2. Service accesses ASM and decrypts the secret for customer.
  3. Service extracts IV and key
  4. Service initialises cipher scheme with IV and key and decrypts customer data.

Benefits: You encrypt and decrypt the secret values in ASM with a KMS key under your full control, and you can store and recover whatever state you need to decrypt the customer values in a secure manner.

Others will probably have cryptographically better solutions, but this should do for a first attempt.

mcfinnigan
  • 11,442
  • 35
  • 28
  • We thought about doing this very thing though you've fleshed it out more thoroughly with the namespace idea. The downside is that it exposes the client key to our client lambda code - we were hoping to keep all crypto operations securely within the confines of KMS, but I guess that simply isn't possible given there is no way to specify an IV for KMS nor are there even any data hashing operations, which would have been our alternative - have KMS both encrypt for one column and hash for the lookup column. I just keep thinking this must be solved in a secure way given how mature AWS are. – Mike B Feb 24 '20 at 23:13
0

In the end we decided to continue to use KMS for the customer supplied key encrypt/decrypt of the sensitive ID column but also enabled the PostgreSQL pgcrypt extension to provide secure hashes for lookups. So in addition to our encrypted column we added an id_hash column and we operate on the table something like this:

`INSERT INTO employee VALUES ..., id_hash = ENCODE(HMAC('SENSITIVE_ID+SECRET_SALT', 'SECRET_PASSPHRASE', 'sha256'), 'hex');

SELECT FROM employee WHERE division_id = ??? AND id_hash = ENCODE(HMAC('SENSITIVE_ID+SECRET_SALT', 'SECRET_PASSPHRASE', 'sha256'), 'hex');`

We could have done the hashing client-side but since the algorithm is key to later lookups we liked the simplicity of having the DB do the hashing for us.

Hope this is of use to anyone else looking for a solution.

Mike B
  • 21
  • 4
  • Yes, that was the intention in my answer. However - please note - the cryptographic hash is designed to be collision resistant, but it cannot be mathematically guaranteed that it is unique. If you are working with regulated and audited data (e. g. payments, health records) you may want to check and ensure or handle that there are no duplicates – gusto2 Mar 02 '20 at 21:59