1

I'm trying to implement code to start a custom job in Vertex.

I have no problem starting a custom job using gcloud:

gcloud ai custom-jobs --project my_project_id create --region=europe-west1 --display-name="train model based on custom container" --worker-pool-spec=machine-type=n1-standard-4,replica-count=1,container-image-uri=europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest

I've not been able to find official code sample for .NET but tried to mimick someone else doing it in Python plus ChatGPT produced a similar code sample:

var projectId = "my_project_id";
var locationId = "europe-west1";
var client = await JobServiceClient.CreateAsync();

var createCustomJobRequest = new CreateCustomJobRequest
{
  ParentAsLocationName = new LocationName(projectId, locationId),
  CustomJob = new CustomJob
  {
    DisplayName = "train model based on custom container",
    JobSpec = new CustomJobSpec()
    {
      WorkerPoolSpecs =
      {
        new WorkerPoolSpec
        {
          MachineSpec = new MachineSpec
          {
            MachineType = "n1-standard-4"
          },
          ReplicaCount = 1,
          ContainerSpec = new ContainerSpec()
          {
            ImageUri = "europe-west1-docker.pkg.dev/my_project_id/my-repo/my-custom-prototype:latest"
          }
        }
      }
    }
  }
};

var result3 = await client.CreateCustomJobAsync(createCustomJobRequest); // exception thrown here

Unfortunately, I get an exception back:

Grpc.Core.RpcException: 'Status(StatusCode="Unimplemented", Detail="Bad gRPC response. HTTP status code: 404")'

Things I've tried and failed

  1. Used the overload of CreateCustomJobAsync() that takes a CustomJob and a Parent instead of a CreateCustomJobRequest object.
  2. Used JobServiceClientBuilder instead of JobServiceClient.CreateAsync() and set the Endpoint argument as europe-west1-aiplatform.googleapis.com.

What am I missing to get a custom job started in Vertex AI?

OnionJack
  • 130
  • 2
  • 10
  • I found VB.Net code : https://cloud.google.com/vertex-ai/docs/samples/aiplatform-create-custom-job-sample?force_isolation=true – jdweng May 25 '23 at 12:19
  • @jdweng: Where's the VB code on that page? It's got Java, Node, and Python. – Jon Skeet May 25 '23 at 15:43
  • I'll try to reproduce this, and work out what's going on. (It's unlikely to be the client library, but I can't see anything obvious wrong with the code here.) It probably won't be for another ~16 hours though. – Jon Skeet May 25 '23 at 15:44
  • @JonSkeet : It looks to me like VB.Net. – jdweng May 25 '23 at 16:03
  • @jdweng: *What* does? There's a tab at the top of the sample letting you choose which language you want to use. What does that tab show? What's the first non-comment line of the sample you're looking at? – Jon Skeet May 25 '23 at 16:17
  • Jon is right, none of them are C# samples. Anyway, found a C# code sample here @ https://github.com/googleapis/google-cloud-dotnet/blob/main/apis/Google.Cloud.AIPlatform.V1/Google.Cloud.AIPlatform.V1.Snippets/JobServiceClientSnippets.g.cs#L48 in case anyone's curious. – OnionJack May 25 '23 at 20:04
  • Apologies for not remembering that Vertex AI needs regionalized endpoints to be specified - see my new answer for a change other than permissions. – Jon Skeet May 31 '23 at 06:59

2 Answers2

1

I should have digged a bit more around JobServiceClientBuilder. Specifically, when using the builder's client object to start a job I actually got a different message back:

Grpc.Core.RpcException
  HResult=0x80131500
  Message=Status(StatusCode="PermissionDenied", Detail="Permission 'aiplatform.customJobs.create' denied on resource '//aiplatform.googleapis.com/projects/my_project_id/locations/europe-west1' (or it may not exist).")
  Source=Google.Api.Gax.Grpc

While the message was somewhat clear I wasn't sure if it was the right error message, like how Unimplemented didn't make sense so I dismissed this one too.

Anyway, since writing the question I thought that gcloud and SDK authentication may be different. It turns out that the active user in the command line (the * next to the user in gcloud auth list) is my own credential while the environment variable GOOGLE_APPLICATION_CREDENTIAL is referencing a service account. Once I added the role Vertex AI Administrator to the service account I was finally able to start a job.

So, use JobServiceClient.CreateAsync() if the sa behind GOOGLE_APPLICATION_CREDENTIAL has the right permission. If you need to use another sa then instantiate a JobServiceClient like so:

var client = await new JobServiceClientBuilder
{
    Endpoint = "europe-west1-aiplatform.googleapis.com",
    GoogleCredential = GoogleCredential.FromFile(@"your-service-account.json")
}.BuildAsync();

I know the latter is "standard GCP authentication" knowledge, it just didn't come to my mind immediately.

OnionJack
  • 130
  • 2
  • 10
  • 1
    Thanks for this - I've just been trying to diagnose the same thing. Note that you can also use the CredentialsPath property instead of `GoogleCredential.FromFile`. I'm going to file an issue with the VertexAI team, as that error message is really unhelpful. – Jon Skeet May 26 '23 at 07:37
0

I'm not sure why giving appropriate permissions fixed the problem for the OP, but there's a different issue with the code which is causing the obscure error message. As per the client library documentation, Vertex AI requires regionalized endpoints, so the client should be constructed to reflect the resources that will be accessed. In this case, the code would be:

var client = await new JobServiceClientBuilder
{
    Endpoint = "europe-west1-aiplatform.googleapis.com"
}.BuildAsync();

... obviously you can make that dynamic by region using an interpolated string literal, if you're using resources in multiple regions. But you do need different clients for different regions.

I'm hoping we can make the error less obscure in the future.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194