I'm attempting to run a Vertex AI custom training job using the python SDK, following the general instructions laid out in this readme. My code is as follows (sensitive data removed):
job = aiplatform.CustomContainerTrainingJob(
display_name='python_api_test',
container_uri='{URI FOR CUSTOM CONTAINER IN GOOGLE ARTIFACT REGISTRY}',
staging_bucket='{GCS BUCKET PATH IN 'gs://' FORMAT}',
model_serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-4:latest',
)
job.run(
model_display_name='python_api_model',
args='{ARG PASSED TO CONTAINER ENTRYPOINT}',
replica_count=1,
machine_type='n1-standard-4',
accelerator_type='NVIDIA_TESLA_T4',
accelerator_count=2,
environment_variables={
{A COUPLE OF SECRETS PASSED TO CONTAINER IN DICTIONARY FORMAT}
}
)
When I execute job.run()
, I get the following error:
InvalidArgument: 400 Unable to parse `training_pipeline.training_task_inputs` into custom task `inputs` defined in the file: gs://google-cloud-aiplatform/schema/trainingjob/definition/custom_task_1.0.0.yaml
The full traceback does not show where it is unhappy with any specific inputs. I've successfully run jobs in the same container using the Vertex CLI.I'm confident that there is nothing wrong with my aiplatform.init()
(I'm running the job from a Vertex workbench machine in the same project).