0

I have a custom model in Vertex AI and a table storing the features for the model along with the record_id.
I am building pipeline component for the batch prediction and facing a critical issue. When I submit the batch_prediction, I should exclude the record_id for the job but How can I map the record if I don't have the record_id in the result?

from google.cloud import bigquery
from google.cloud import aiplatform

aiplatform.init(project=project_id)
client = bigquery.Client(project=project_id)

query = '''
SELECT * except(record_id) FROM `table`
'''
df = client.query(query).to_dataframe()  # drop the record_id and load it to another table 
job = client.load_table_from_dataframe(
    X, "table_wo_id",
) 

clf = aiplatform.Model(model_id = 'custom_model')
clf.batch_predict(job_display_name = 'custom model batch prediction',
                 bigquery_source = 'bq://table_wo_id',
                 instances_format = 'bigquery',
                 bigquery_destination_prefix = 'bq://prediction_result_table',
                 predictions_format = 'bigquery',
                 machine_type = 'n1-standard-4',
                 max_replica_count = 1
                 )

like the above example, there is no record_id column in prediction_result_table. There is no way to map the result back to each record

CHOCOLEO
  • 363
  • 2
  • 9
  • Could you elaborate more on your use case and clarify your requirements? What do you mean when you say `I should exclude the record_id for the job…..` Why are you excluding record_id? – kiran mathew Jan 12 '23 at 15:06
  • let say each record is one customer, I want to select the top N customers with highest score predicted. But when I do the prediction I have to exclude the ID for the batch prediction function in Vertex AI (https://cloud.google.com/vertex-ai/docs/predictions/get-predictions#bigquery). so the prediction output doesn't have the record ID. I won't know which customer belongs to which score. I found a solution using REST (https://cloud.google.com/vertex-ai/docs/predictions/get-predictions#filter_and_transform_input_data_preview) I am wondering if the Python SDK has the same function. – CHOCOLEO Jan 13 '23 at 03:19
  • Do you want to include record_id during batch prediction?If you want to include `record_id` add it in the `included_fields` of `instanceConfig`. – kiran mathew Jan 17 '23 at 10:57

0 Answers0