1

I'm attempting to run a Vertex AI custom training job using the python SDK, following the general instructions laid out in this readme. My code is as follows (sensitive data removed):

job = aiplatform.CustomContainerTrainingJob(
    display_name='python_api_test',
    container_uri='{URI FOR CUSTOM CONTAINER IN GOOGLE ARTIFACT REGISTRY}',
    staging_bucket='{GCS BUCKET PATH IN 'gs://' FORMAT}',
    model_serving_container_image_uri='us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-4:latest',
)

job.run(
    model_display_name='python_api_model',
    args='{ARG PASSED TO CONTAINER ENTRYPOINT}',
    replica_count=1,
    machine_type='n1-standard-4',
    accelerator_type='NVIDIA_TESLA_T4',
    accelerator_count=2,
    environment_variables={
        {A COUPLE OF SECRETS PASSED TO CONTAINER IN DICTIONARY FORMAT}
    }
)

When I execute job.run(), I get the following error:

InvalidArgument: 400 Unable to parse `training_pipeline.training_task_inputs` into custom task `inputs` defined in the file: gs://google-cloud-aiplatform/schema/trainingjob/definition/custom_task_1.0.0.yaml

The full traceback does not show where it is unhappy with any specific inputs. I've successfully run jobs in the same container using the Vertex CLI.I'm confident that there is nothing wrong with my aiplatform.init() (I'm running the job from a Vertex workbench machine in the same project).

JmeCS
  • 497
  • 4
  • 17
  • A possibility that can be happening is that you are passing the `Project Name` and you need to pass the `Project Id` or the `Project Number`. This error happens when you have an Invalid argument so it can not be mapped. – Jose Gutierrez Paliza Jan 06 '22 at 16:53
  • Thanks - I've checked, and aiplatform is initialized with the correct project ID (not project Name). – JmeCS Jan 06 '22 at 21:33
  • Perhaps you could also share the most similar example that did work. Or otherwise start from a working example and keep changing 1 thing at a time till something breaks. – Dennis Jaheruddin Apr 07 '23 at 21:59

0 Answers0