0

I have a trained google AutoML text classification model which i want to deploy on 500rows of data stored in a csv file.The csv file is stored in a google storage bucket and the model to predict each row as "true or false",depending on the what the model returns. Currently, it seems like the code only supports single line/one text prediction. How can i go doing batch classification using the created model?

1 Answers1

0

See below for the solution that is working for me.

import pandas as pd
import numpy as np
from google.cloud import automl_v1beta1 as automl
from google.oauth2 import service_account

# Load the csv
# For my case, I am predicting either 'Include' or 'Exclude' classes
data =pd.read_csv('../df_pred.csv', encoding='utf-8')

# assign project id and model id
project_id = 'xxxxxx'
compute_region = 'us-central1'
model_id = 'xxxxx'

# Create client for prediction service.
credentials = service_account.Credentials.from_service_account_file("xxxxx.json")
automl_client = automl.AutoMlClient(credentials=credentials)
prediction_client = automl.PredictionServiceClient(credentials=credentials)


# Get the full path of the model.
model_full_id = automl_client.model_path(
    project_id, compute_region, model_id
)

# Loop over the csv lines for the sentences you want to predict

# Temp dataframe to store the prediction scores
df = pd.DataFrame()

# sentence = column of interest
for sentence in data.sentence.values:
    snippet = sentence

    # Set the payload by giving the content and type of the file.
    payload = {"text_snippet": {"content": snippet, "mime_type": "text/plain"}}

    # params is additional domain-specific parameters.
    # currently there is no additional parameters supported.
    params = {}
    response = prediction_client.predict(model_full_id, payload, params)

    temp = pd.DataFrame({'p_exclude': [response.payload[0].classification.score], 
                         'p_include': [response.payload[1].classification.score]})

    df = pd.concat([df, temp],ignore_index=True)

# Add the predicted scores to the original Dataframe 
df_automl = pd.concat([data, df], axis =1)
# Export the new Dataframe
df_automl.to_csv("df_automl.csv", index = False)


iEvidently
  • 33
  • 2