I have a trained google AutoML text classification model which i want to deploy on 500rows of data stored in a csv file.The csv file is stored in a google storage bucket and the model to predict each row as "true or false",depending on the what the model returns. Currently, it seems like the code only supports single line/one text prediction. How can i go doing batch classification using the created model?
Asked
Active
Viewed 149 times
1 Answers
0
See below for the solution that is working for me.
import pandas as pd
import numpy as np
from google.cloud import automl_v1beta1 as automl
from google.oauth2 import service_account
# Load the csv
# For my case, I am predicting either 'Include' or 'Exclude' classes
data =pd.read_csv('../df_pred.csv', encoding='utf-8')
# assign project id and model id
project_id = 'xxxxxx'
compute_region = 'us-central1'
model_id = 'xxxxx'
# Create client for prediction service.
credentials = service_account.Credentials.from_service_account_file("xxxxx.json")
automl_client = automl.AutoMlClient(credentials=credentials)
prediction_client = automl.PredictionServiceClient(credentials=credentials)
# Get the full path of the model.
model_full_id = automl_client.model_path(
project_id, compute_region, model_id
)
# Loop over the csv lines for the sentences you want to predict
# Temp dataframe to store the prediction scores
df = pd.DataFrame()
# sentence = column of interest
for sentence in data.sentence.values:
snippet = sentence
# Set the payload by giving the content and type of the file.
payload = {"text_snippet": {"content": snippet, "mime_type": "text/plain"}}
# params is additional domain-specific parameters.
# currently there is no additional parameters supported.
params = {}
response = prediction_client.predict(model_full_id, payload, params)
temp = pd.DataFrame({'p_exclude': [response.payload[0].classification.score],
'p_include': [response.payload[1].classification.score]})
df = pd.concat([df, temp],ignore_index=True)
# Add the predicted scores to the original Dataframe
df_automl = pd.concat([data, df], axis =1)
# Export the new Dataframe
df_automl.to_csv("df_automl.csv", index = False)

iEvidently
- 33
- 2