I am calling a Sagemaker endpoint using java Sagemaker SDK. The data that I am sending needs little cleaning before the model can use it for prediction. How can I do that in Sagemaker.
I have a pre-processing function in the Jupyter notebook instance which is cleaning the training data before passing that data to train the model. Now I want to know if I can use that function while calling the endpoint or is that function already being used? I can show my code if anyone wants?
EDIT 1 Basically, in the pre-processing, I am doing label encoding. Here is my function for preprocessing
def preprocess_data(data):
print("entering preprocess fn")
# convert document id & type to labels
le1 = preprocessing.LabelEncoder()
le1.fit(data["documentId"])
data["documentId"]=le1.transform(data["documentId"])
le2 = preprocessing.LabelEncoder()
le2.fit(data["documentType"])
data["documentType"]=le2.transform(data["documentType"])
print("exiting preprocess fn")
return data,le1,le2
Here the 'data' is a pandas dataframe.
Now I want to use these le1,le2 at the time of calling endpoint. I want to do this preprocessing in sagemaker itself not in my java code.