1

I have succesfully trained my first network with the Google Cloud ML engine, and now I am trying to make the setup a bit more secure by providing my own encryption key for encrypting the data. As explained in the manual I have now copied my data to the Cloud Storage with my own custom encryption key, instead of storing it there unencrypted.

However, now my setup (obviously!) broke, as the Python code I submit to the ML Engine cannot decrypt the files. I am expecting an option like --decrypt-key to gcloud ml-engine jobs submit training, but I cannot find such an option. How to provide this key such that my code can decrypt the data?

jthread
  • 13
  • 2

1 Answers1

0

Short answer: You should not pass the decryption key into the training job. Instead see https://cloud.google.com/kms/docs/store-secrets

Long answer: While you could technically make the decryption key a flag that gets passed through the Training Job definition, this would expose it to anyone with access to List Training Jobs. You should instead place the key in the Google Cloud Key Management Service and give the service account running the ML training job permission to fetch the key from there.

You can determine the service account that runs the training job by following the procedure listed at https://cloud.google.com/ml-engine/docs/how-tos/working-with-data#using_a_cloud_storage_bucket_from_a_different_project

Edit: Also note what Alexey says in the comment below; Tensorflow won't currently be able to read and decrypt the files directly from GCS, you'll need to copy them to local disk on every worker with the keys supplied to gsutil cp.

Chris Meyers
  • 1,426
  • 9
  • 14
  • In addition to what Chris said, I'd like to point out that you won't be able to read encrypted files from GCS via TensorFlow, because the GCS client library in TensorFlow doesn't support them yet. Instead, you'll need to copy the training data from GCS to the local disk for every training worker using `gsutil cp` (with supplying the encryption keys) and then direct your training TensorFlow code to read data from local disk. – Alexey Surkov Sep 01 '17 at 17:24
  • Storing the key in GCKMS defeats the purpose, as I can then as well use Google's default server-side encryption. In case that data leaks from the Google Cloud Storage, I don't want anyone to be able to read the files. Clearly while running the code the key should be available, but as I understand from [this page](https://cloud.google.com/storage/docs/encryption) the key can then be in memory and purged after the operation succeeded. – jthread Sep 01 '17 at 19:01
  • TL;DR: The only safe solution would be to pass the key when submitting a new job, but apparently that is not possible. – jthread Sep 01 '17 at 19:02