Preprocess input data before making predictions inside Amazon SageMaker

Question

I have a Keras/tensorflow model that we have trained by ourselves which does image related prediction. I have followed this trained keras model tutorial to deploy the model in Sagemaker and can invoke the endpoint for prediction.

Now on my client side code, before making the prediction by calling the Sagemaker endpoint, I need to download the image and do some preprocessing. Instead of doing this in the client side, I want to do this entire process in SageMaker. How do I do that?

It seems I need to update the entry point python code train.py as mentioned here:

sagemaker_model = TensorFlowModel(model_data = 's3://' + sagemaker_session.default_bucket() + '/model/model.tar.gz',
                                  role = role,
                                  entry_point = 'train.py')

Other articles indicates that I need to override input_fn function to capture the preprocessing. But these articles refer to steps used if using MXNet framework. But my model is based on Keras/tensorflow framework.

So I am not sure how to override the input_fn function. Can anyone please suggest?

What about this link https://github.com/aws/sagemaker-python-sdk/blob/master/src/sagemaker/tensorflow/deploying_python.rst#overriding-input-preprocessing-with-an-input_fn — Antoine Trouve, Mar 01 '19 at 12:04
This questions has a related answer as well: https://stackoverflow.com/questions/49775557/how-can-i-invoke-a-sagemaker-model-trained-with-tensorflow-using-a-csv-file-in — Antoine Trouve, Mar 01 '19 at 12:21

score 6 · Answer 1 · answered May 24 '19 at 22:27

I had the same problem and finally figured out how to do it.

Once you have your model_data ready, you can deploy it with the following lines.

from sagemaker.tensorflow.model import TensorFlowModel
sagemaker_model = TensorFlowModel(
            model_data = 's3://path/to/model/model.tar.gz',
            role = role,
            framework_version = '1.12',
            entry_point = 'train.py',
            source_dir='my_src',
            env={'SAGEMAKER_REQUIREMENTS': 'requirements.txt'}
)

predictor = sagemaker_model.deploy(
    initial_instance_count=1,
    instance_type='ml.m4.xlarge', 
    endpoint_name='resnet-tensorflow-classifier'
)

Your notebook should have a my_src directory which contains a file train.py and a requirements.txt file. The train.py file should have a function input_fn defined. For me, that function handled image/jpeg content:

import io
import numpy as np
from PIL import Image
from keras.applications.resnet50 import preprocess_input
from keras.preprocessing import image

JPEG_CONTENT_TYPE = 'image/jpeg'

# Deserialize the Invoke request body into an object we can perform prediction on
def input_fn(request_body, content_type=JPEG_CONTENT_TYPE):
    # process an image uploaded to the endpoint
    if content_type == JPEG_CONTENT_TYPE:
        img = Image.open(io.BytesIO(request_body)).resize((300, 300))
        img_array = np.array(img)
        expanded_img_array = np.expand_dims(img_array, axis=0)
        x = preprocess_input(expanded_img_array)
        return x


    else: 
        raise errors.UnsupportedFormatError(content_type)

Your processing code will depend on the model architecture you used. I was doing transfer learning off resnet50, so I used preprocess_input from keras.applications.resnet50.

Note that since my train.py code imports some modules, I had to supply requirements.txt defining those modules (that was the part I had trouble finding in the docs).

Hope this helps someone in the future.

my requirements.txt:

absl-py==0.7.1
astor==0.8.0
backports.weakref==1.0.post1
enum34==1.1.6
funcsigs==1.0.2
futures==3.2.0
gast==0.2.2
grpcio==1.20.1
h5py==2.9.0
Keras==2.2.4
Keras-Applications==1.0.7
Keras-Preprocessing==1.0.9
Markdown==3.1.1
mock==3.0.5
numpy==1.16.3
Pillow==6.0.0
protobuf==3.7.1
PyYAML==5.1
scipy==1.2.1
six==1.12.0
tensorboard==1.13.1
tensorflow==1.13.1
tensorflow-estimator==1.13.0
termcolor==1.1.0
virtualenv==16.5.0
Werkzeug==0.15.4

I guess, for processing the output I need to implement output_fn()? Do you have an example? — Stiefel, Sep 01 '20 at 11:00
hey @Stiefel I dont have an example of output_fn unfortunately, seems like a good guess though! — alex9311, Sep 02 '20 at 17:13

Preprocess input data before making predictions inside Amazon SageMaker

1 Answers1