How to make prediction with sagemaker on pandas dataframe

Question

I am using Sagemaker to train and deploy my machine learning model. As regard to prediction, it will be executed by a lambda function as a scheduled job (every hour). The process is as follows:

pull new data from S3 since last prediction
preprocess, aggregate and create prediction data set
call sagemaker endpoint and make prediction
either save result to s3 or insert to database table

Based on my finding, typically the input will either from lambda payload

data = json.loads(json.dumps(event))
payload = data['data']
print(payload)

response = runtime.invoke_endpoint(EndpointName=ENDPOINT_NAME,
                                   ContentType='text/csv',
                                   Body=payload)

or read from s3 file: my_bucket = resource.Bucket('pred_data') #subsitute this for your s3 bucket name.

obj = client.get_object(Bucket=my_bucket, Key='foo.csv')
lines= obj['Body'].read().decode('utf-8').splitlines()
reader = csv.reader(lines)
file = io.StringIO(lines)


response = runtime.invoke_endpoint(EndpointName=ENDPOINT,
                                   ContentType='*/*',
                                   Body = file.getvalue(),
                                   Body=payload)
output = response['Body'].read().decode('utf-8')

Since I will be pulling raw data from s3 and preprocess, a pandas dataframe will be generated. Is it possible to feed this directly as the input of invoke_endpoint? I could upload the aggregated dataset to another S3 bucket, but does it have to go through the decoding, csv.reader, StringIO and all that just like the example I found or is there an easy way to do it? Is the decode step really necessary to get the output?

score 0 · Answer 1 · edited May 07 '21 at 21:39

You can send whatever payload you want when you call InvokeEndpoint and in whatever format. You can control the contract on either side (assuming your model supports it). If you are using a model that you didn't create, look to see if it supports pre/post processing which would allow you to define the contract yourself.

In addition to this, one thing we often see customers do is to do processing within the model instead of before calling SageMaker's InvokeEndpoint. A common use case is to accept the S3 path of the object you need to do predictions on when you call InvokeEndpoint. Then the model would be responsible for downloading the S3 item and transforming it and then running the inference on that data.

Depending on the InvokeEndpoint response, it can do the same and the model can upload it to S3 and just send the S3 key back as a response. This might not be what you are looking to do but it's just an additional example of the flexibility you have when using SageMaker.

Would you mind providing some guidelines (links?) on how to create such a model that accepts an S3 path and performs pre- and post processing? — alex, May 17 '21 at 13:34

How to make prediction with sagemaker on pandas dataframe

1 Answers1