Equivalent of predict_proba of scikit-learn for ONNX C++ API

Question

I have trained a classification model and I use that the ONNX format of that model in C++ to predict value as follow:

auto inputOnnxTensor = Ort::Value::CreateTensor<float>(memoryInfo, inputValues.data(), inputValues.size(), inputDims.data(), inputDims.size());

auto outputValues = session.Run(Ort::RunOptions{ nullptr }, inputNames.data(), &inputOnnxTensor, 1, outputNames.data(), 1);
   
auto* result = outputValues[0].GetTensorMutableData<int>();

In Python using predict_proba in Scikit-learn we are able to infer the class probabilities (i.e. the probability that a particular data point falls into the underlying classes).

How can I obtain the same probability values of predict_proba() in C++ with ONNX format? Is there any equivalent to predict_proba in ONNX C++ API?

Do you have any progress with this problem? I'd appreciate any hint how to get the list of predicted probabilities. — Ordev Agens, Oct 12 '22 at 16:45
No unfortunately, I was not able to find a way to get the probabilities — Pedram Hooshangitabrizi, Oct 13 '22 at 17:17

Ordev Agens · Answer 1 · 2022-10-27T10:11:02.747

When converting your classification model to ONNX (I assume you use skl2onnx), disable ZipMap. I'm not sure about other options, but here is my working code:

model = to_onnx(my_rfc_model, x_train, 
    options={'zipmap': False, 'output_class_labels': False, 'raw_scores': False})
onnx.save_model(model, "model.onnx")

In my case (I use RandomForestClassifier), the model contains one input and two outputs, it is by default. The first output provides the classification results and the second output provides the probabilities for each class. By disabling ZipMap we get probabilities serialized sequentially. For example, if you have 3 possible classes and 2 samples with class probability distributions [[0.1, 0.2, 0.7], [0.3, 0.5, 0.2]], then, when predicting using onnx runtime in C++, the probabilities will be stored in output memory sequentially: [0.1, 0.2, 0.7, 0.3, 0.5, 0.2].

To get probabilities, use correct output name (by default it is probabilities). You can find all output names using GetOutputCount() and GetOutputName(). See this example: https://github.com/leimao/ONNX-Runtime-Inference/blob/main/src/inference.cpp

Create output tensor with enough space to hold probabilities for each class:

std::vector<float> proba(3 * num_samples);
std::vector<Ort::Value> output_tensors;                                                         
output_tensors.push_back(Ort::Value::CreateTensor<float>(
memoryInfo, proba.data(), 3*num_samples, output_dims_.data(), output_dims_.size()));

Note that we provide room for 3 * number_of_samples floats.

Run prediction:

session_->Run(Ort::RunOptions{nullptr}, input_names.data(), input_tensors.data(), 1, 
    output_names.data(), output_tensors.data(), 1);

In my case output_names declared as follows:

std::vector<const char*> output_names {"probabilities"};

Hope this will help you.

Setting `'zipmap': False` causes `onnxruntime.InferenceSession` to return class and probability arrays, as opposed to a list of dicts. Thanks! — trianta2, Mar 09 '23 at 21:08

Equivalent of predict_proba of scikit-learn for ONNX C++ API

1 Answers1