I'm running custom training jobs in google's Vertex AI. A simple gcloud
command to execute a custom job would use something like the following syntax (complete documentation for the command can be seen here):
gcloud beta ai custom-jobs create --region=us-central1 \
--display-name=test \
--config=config.yaml
In the config.yaml
file, it is possible to specify the machine and accelerator (GPU) types, etc., and in my case, point to a custom container living in the Google Artifact Registry that executes the training code (specified in the imageUri
part of the containerSpec
). An example config file may look like this:
# config.yaml
workerPoolSpecs:
machineSpec:
machineType: n1-highmem-2
acceleratorType: NVIDIA_TESLA_P100
acceleratorCount: 2
replicaCount: 1
containerSpec:
imageUri: {URI_FOR_CUSTOM_CONATINER}
args:
- {ARGS TO PASS TO CONTAINER ENTRYPOINT COMMAND}
The code we're running needs some runtime environment variables (that need to be secure) passed to the container. In the API documentation for the containerSpec
, it says it is possible to set environment variables as follows:
# config.yaml
workerPoolSpecs:
machineSpec:
machineType: n1-highmem-2
acceleratorType: NVIDIA_TESLA_P100
acceleratorCount: 2
replicaCount: 1
containerSpec:
imageUri: {URI_FOR_CUSTOM_CONATINER}
args:
- {ARGS TO PASS TO CONTAINER ENTRYPOINT COMMAND}
env:
- name: SECRET_ONE
value: $SECRET_ONE
- name: SECRET_TWO
value: $SECRET_TWO
When I try and add the env
flag to the containerSpec
, I get an error saying it's not part of the container spec:
ERROR: (gcloud.beta.ai.custom-jobs.create) INVALID_ARGUMENT: Invalid JSON payload received. Unknown name "env" at 'custom_job.job_spec.worker_pool_specs[0].container_spec': Cannot find field.
- '@type': type.googleapis.com/google.rpc.BadRequest
fieldViolations:
- description: "Invalid JSON payload received. Unknown name \"env\" at 'custom_job.job_spec.worker_pool_specs[0].container_spec':\
\ Cannot find field."
field: custom_job.job_spec.worker_pool_specs[0].container_spec
Any idea how to securely set runtime environment variables in Vertex AI custom jobs using custom containers?