I am trying to use amazon sagemaker linear-learner algorithm, it support content type of ‘application/x-recordio-protobuf’. In preprocessing phase, i used scikit-learn preprocessing to one-hot-encode my features. Then i use linear learner estimator to with record-io converted input data.
I used package and the preprocess conversion was successful.
from sagemaker.amazon.common import write_spmatrix_to_sparse_tensor
def output_fn(prediction, accept):
"""Format prediction output
The default accept/content-type between containers for serial inference is JSON.
We also want to set the ContentType or mimetype as the same value as accept so the next
container can read the response payload correctly.
"""
if accept == 'text/csv':
return worker.Response(encoders.encode(prediction.todense(), accept), mimetype=accept)
elif accept == 'application/x-recordio-protobuf':
buf = BytesIO()
write_spmatrix_to_sparse_tensor(buf, prediction)
buf.seek(0)
return worker.Response(buf, accept, mimetype=accept)
else:
raise RuntimeError("{} accept type is not supported by this script.".format(accept))
But when linear-learner takes the input record, it fails with the error below
Caused by: [15:53:30] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.774.0/AL2012/generic-flavor/src/src/aialgs/io/iterator_base.cpp:100:
(Input Error) The header of the MXNet RecordIO record at position 810 in the dataset does not start with a valid magic number.