How to use TopK operator in ONNX?

Question

I am writing a converter and calculator to convert my custom sklearn transformers into ONNX models. I need to calculate the median of my data points. Interesting point - ONNX has no function to calculate the median (at least I didn't find anything here.

So, I am using the TopK operator to calculate the median. Now, since it returns two outputs, it's a bit tricky to use it:

I tried to use it like this:

def mt_transformer_converter(scope, operator, container):
    op = operator.raw_operator
    opv = container.target_opset
    out = operator.outputs

    X = operator.inputs[0]
    n = operator.inputs[0].get_second_dimension()
    dtype = guess_numpy_type(X.type)

    # This is the line of focus
    Y = OnnxTopK(X, np.array([3]),op_version=opv, output_names=out[:1])
    
    Y.add_to(scope, container)

It threw this error:

ValueError: Unexpected index 1 in operator name 'TopK' with .output names ['variable']

It was apparent this would be the case since TopK returns not 1 but two outputs - The top K values and their corresponding indices. So, the next obvious option was to change the output names as follows:

Y = OnnxTopK(X, np.array([3]),op_version=opv, output_names=['values','indices'])

It threw this error:

RuntimeError: After 2 iterations for 2 nodes, still unable to sort names {'variable'}. The graph may be 
disconnected. List of operators: 
Cast(variable) -> [Y]
--
--all-nodes--
--
TopK|To_TopK(X#0, To_TopKcst#0) -> [values, indices]
Cast|Cast(variable) -> [Y]

The code above disconnects the graph, because if you check the operator.outputs, you will see [Variable('variable', 'variable', type=FloatTensorType(shape=[None, 3]))]. Thus, it expects the output with the name variable, which appears out of nowhere in the graph, hence the error.

Now, there are two things:

I just need the top K "values" and not the "indices".
I can make the code run somehow (although it doesn't serve my purpose) using a different operator say ReduceSum.

Y =  OnnxReduceSum(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])

This makes the code run fine. On closer inspection, one would notice that we have used indexing to complete objective [1] and since now TopK is not the last operator in the graph, we have used the output_names as out[:1] where out = operator.outputs to complete objective [2].

But still, we haven't got TopK. If only we could replace ReduceSum with some different operator so that we get the results of TopK. Thankfully, ONNX has Identity!!!

So, we can finally modify the line to:

Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])

This gives us the desired result. Now, the question is - Is there a cleaner more straightforward way to do this?

PS - The complete MWE (Minimal workable example) is as follows:

import numpy as np
import pandas as pd

from onnxruntime import InferenceSession

from sklearn.base import BaseEstimator, TransformerMixin
from skl2onnx import convert_sklearn
from skl2onnx.common.data_types import FloatTensorType, DoubleTensorType, guess_numpy_type
from skl2onnx.algebra.onnx_ops import (
    OnnxReduceSum,
    OnnxTopK,
    OnnxIdentity
)
from skl2onnx import update_registered_converter

def mt_transformer_shape_calculator(operator):
    op = operator.raw_operator
    input_type = operator.inputs[0].type.__class__
    input_dim = operator.inputs[0].get_first_dimension()
    n = operator.inputs[0].get_second_dimension()
    
    output_type = input_type([input_dim, 3])
    operator.outputs[0].type = output_type
    
def mt_transformer_converter(scope, operator, container):
    op = operator.raw_operator
    opv = container.target_opset
    out = operator.outputs

    X = operator.inputs[0]
    n = operator.inputs[0].get_second_dimension()
    dtype = guess_numpy_type(X.type)
    Y = OnnxIdentity(OnnxTopK(X, np.array([3]),op_version=opv)[0], op_version=opv, output_names=out[:1])
    Y.add_to(scope, container)
    

class MedianTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        pass



data = pd.DataFrame(
    [[1,2,3,4],[4,5,6,5]]
)
    
update_registered_converter(
    MedianTransformer, "MTTransformer",
    mt_transformer_shape_calculator,
    mt_transformer_converter)

mt = MedianTransformer()
onx = convert_sklearn(mt, name='test', initial_types=[("X", FloatTensorType([None,4]))], 
                      final_types=[("Y", DoubleTensorType([None,3]))])


sess = InferenceSession(onx.SerializeToString())

sess.run(None, {'X': data.values.astype(np.float32)})[0]

Output:

array([[4., 3., 2.],
       [6., 5., 5.]])

How to use TopK operator in ONNX?

0 Answers0