I am writing a converter
and calculator
to convert my custom sklearn transformers into ONNX models. I need to calculate the median of my data points. Interesting point - ONNX has no function to calculate the median (at least I didn't find anything here.
So, I am using the TopK operator to calculate the median. Now, since it returns two outputs, it's a bit tricky to use it:
I tried to use it like this:
def mt_transformer_converter(scope, operator, container):
op = operator.raw_operator
opv = container.target_opset
out = operator.outputs
X = operator.inputs[0]
n = operator.inputs[0].get_second_dimension()
dtype = guess_numpy_type(X.type)
# This is the line of focus
Y = OnnxTopK(X, np.array([3]),op_version=opv, output_names=out[:1])
Y.add_to(scope, container)
It threw this error:
ValueError: Unexpected index 1 in operator name 'TopK' with .output names ['variable']
It was apparent this would be the case since TopK returns not 1 but two outputs - The top K values and their corresponding indices. So, the next obvious option was to change the output names as follows:
Y = OnnxTopK(X, np.array([3]),op_version=opv, output_names=['values','indices'])
It threw this error:
RuntimeError: After 2 iterations for 2 nodes, still unable to sort names {'variable'}. The graph may be
disconnected. List of operators:
Cast(variable) -> [Y]
--
--all-nodes--
--
TopK|To_TopK(X#0, To_TopKcst#0) -> [values, indices]
Cast|Cast(variable) -> [Y]
The code above disconnects the graph, because if you check the operator.outputs
, you will see
[Variable('variable', 'variable', type=FloatTensorType(shape=[None, 3]))]
. Thus, it expects the output with the name variable
, which appears out of nowhere in the graph, hence the error.
Now, there are two things:
- I just need the top K "values" and not the "indices".
- I can make the code run somehow (although it doesn't serve my purpose) using a different operator say
ReduceSum
.
Y = OnnxReduceSum(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])
This makes the code run fine. On closer inspection, one would notice that we have used indexing to complete objective [1] and since now TopK
is not the last operator in the graph, we have used the output_names
as out[:1]
where out = operator.outputs
to complete objective [2].
But still, we haven't got TopK
. If only we could replace ReduceSum
with some different operator so that we get the results of TopK
. Thankfully, ONNX has Identity!!!
So, we can finally modify the line to:
Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])
This gives us the desired result. Now, the question is - Is there a cleaner more straightforward way to do this?
PS - The complete MWE (Minimal workable example) is as follows:
import numpy as np
import pandas as pd
from onnxruntime import InferenceSession
from sklearn.base import BaseEstimator, TransformerMixin
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, DoubleTensorType, guess_numpy_type
from skl2onnx.algebra.onnx_ops import (
OnnxReduceSum,
OnnxTopK,
OnnxIdentity
)
from skl2onnx import update_registered_converter
def mt_transformer_shape_calculator(operator):
op = operator.raw_operator
input_type = operator.inputs[0].type.__class__
input_dim = operator.inputs[0].get_first_dimension()
n = operator.inputs[0].get_second_dimension()
output_type = input_type([input_dim, 3])
operator.outputs[0].type = output_type
def mt_transformer_converter(scope, operator, container):
op = operator.raw_operator
opv = container.target_opset
out = operator.outputs
X = operator.inputs[0]
n = operator.inputs[0].get_second_dimension()
dtype = guess_numpy_type(X.type)
Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])
Y.add_to(scope, container)
class MedianTransformer(BaseEstimator, TransformerMixin):
def fit(self, X, y=None):
return self
def transform(self, X):
pass
data = pd.DataFrame(
[[1,2,3,4],[4,5,6,5]]
)
update_registered_converter(
MedianTransformer, "MTTransformer",
mt_transformer_shape_calculator,
mt_transformer_converter)
mt = MedianTransformer()
onx = convert_sklearn(mt, name='test', initial_types=[("X", FloatTensorType([None,4]))],
final_types=[("Y", DoubleTensorType([None,3]))])
sess = InferenceSession(onx.SerializeToString())
sess.run(None, {'X': data.values.astype(np.float32)})[0]
Output:
array([[4., 3., 2.],
[6., 5., 5.]])