2

When I initiate a batch prediction job on Vertex AI of google cloud, I have to specify a cloud storage bucket location. Suppose I provided the bucket location, 'my_bucket/prediction/', then the prediction files are stored in something like: gs://my_bucket/prediction/prediction-test_model-2022_01_17T01_46_39_898Z, which is a subdirectory within the bucket location I provided. The prediction files are stored within that subdirectory and are named:

prediction.results-00000-of-00002
prediction.results-00001-of-00002

Is there any way to programmatically get the final export location from the batch prediction name, id or any other parameter as shown below in the details of the batch prediction job? enter image description here

Tarique
  • 463
  • 3
  • 16

2 Answers2

2

Not only with those parameters because and you can run the same job multiple times, new folders based on the execution date will be create, but you can get it from the API using your job id (don't forget to set the credentials by GOOGLE_APPLICATION_CREDENTIALS if you are not running on cloud sdk):

Get the output directory by the Vertex AI - Batch prediction API by the job ID:

curl -H "Authorization: Bearer "$(gcloud auth application-default print-access-token) "https://us-central1-aiplatform.googleapis.com/v1/projects/[PROJECT_NAME]/locations/us-central1/batchPredictionJobs/[JOB_ID]"

Output: (Get the value from gcsOutputDirectory )

{
...
   "gcsOutputDirectory": "gs://my_bucket/prediction/prediction-test_model-2022_01_17T01_46_39_898Z"
...
}

EDIT: Getting batchPredictionJobs via Python API:

from google.cloud import aiplatform

#-------
def get_batch_prediction_job_sample(
    project: str,
    batch_prediction_job_id: str,
    location: str = "us-central1",
    api_endpoint: str = "us-central1-aiplatform.googleapis.com",
):

    client_options = {"api_endpoint": api_endpoint}

    client = aiplatform.gapic.JobServiceClient(client_options=client_options)
    name = client.batch_prediction_job_path(
        project=project, location=location, batch_prediction_job=batch_prediction_job_id
    )
    response = client.get_batch_prediction_job(name=name)
    print("response:", response)
#-------
get_batch_prediction_job_sample("[PROJECT_NAME]","[JOB_ID]","us-central1","us-central1-aiplatform.googleapis.com")

Check details about it here Check the API repository here

ewertonvsilva
  • 1,795
  • 1
  • 5
  • 15
  • A couple of questions: How do I input the job ID in the curl command and is there a way to do this using a python API? – Tarique Jan 21 '22 at 14:55
  • Hi @Tarique, I have edited how to get it via curl (I was using post instead of get) and also added the Python snpipet for getting the same info. Check you can send the id via curl or python changing the `[JOB_ID]` by its value. – ewertonvsilva Jan 21 '22 at 16:40
  • Thanks for this. How do I use a service key as a JSON in the client instantiation? – Tarique Jan 22 '22 at 14:29
  • 1
    Hi ! You will have to create the service account and set the environment pointing to the json file. The procedure is described on the [link](https://cloud.google.com/docs/authentication/getting-started). Please consider accepting and up voting the answer if it helps you, to improve the community and to be on the best practices. – ewertonvsilva Jan 24 '22 at 11:44
  • I don't see where this function returns the output location. It returns BatchPredictionJob and when printing it doesn't show gcs_output_directory. – havryliuk Mar 21 '23 at 16:10
1

Just adding a cherry on top of @ewertonvsilva's answer...

If you are following Google's example on programmatically getting the batch prediction,

The object response from response = client.get_batch_prediction_job(name=name) has the output_config attribute that you need. All you need to do is to call response.output_info.gcs_output_directory once the prediction job is complete.

Donny
  • 36
  • 4
  • 1
    For this to be possible I would need the python script running till the batch prediction is complete, right? That might take minutes or even hours. Ideally, I would want to stop the python script once I have initialized the batch prediction job. Then after a certain predetermined time(based on how much it takes for the job to complete), I would like to use python API to get the final directory. Although, I found an alternative solution to the problem based on getting the list of blobs from Google cloud storage bucket and filtering through that. – Tarique Jan 31 '22 at 06:33
  • @Tarique you specify the output directory when you run the batch prediction job, right? You can still try to implement some logic to find the latest folder (folder with most recent timestamp?) in this location. Doesn't sound like anything impossible. – havryliuk Mar 21 '23 at 16:06