Onnx batch prediction slower than sequential prediction

Asked Feb 09 '23 at 15:24

Active Feb 09 '23 at 15:27

Viewed 272 times

I have an ENet model that performs an image segmentation. I trained the model in Tensorflow, converted it to .onnx and I'm running an GPU inference with CUDA and OnnxRuntime in C# .NET6 win application. I would like to predict 16 images (512x512x3) at once. The performance of sequential inference of all images is way faster (1.5 second) then predicting one large vector with all images (3.5 sec). I'm out of ideas why this could be the case...

Snippets: This call is slower

        var tensor = new DenseTensor<float>(data, new[] { 16, 3, 512, 512 });

        var inputs = new List<NamedOnnxValue>()
            {
                NamedOnnxValue.CreateFromTensor(INPUT_COLUMN_NAME, tensor)
            };

        return _session.Run(inputs).ElementAt(0).AsTensor<float>().ToArray();

then 16 consecutive calls of this

        var tensor = new DenseTensor<float>(data, new[] { 1, 3, 512, 512 });

        var inputs = new List<NamedOnnxValue>()
            {
                NamedOnnxValue.CreateFromTensor(INPUT_COLUMN_NAME, tensor)
            };

        return _session.Run(inputs).ElementAt(0).AsTensor<float>().ToArray();

edited Feb 09 '23 at 15:27

Robert Crovella

143,785
11
213
257

asked Feb 09 '23 at 15:24

Michal Cicatka

Similar issue here. Were you eventually able to figure out why? – trazoM May 17 '23 at 10:06
Unfortunately not. – Michal Cicatka May 18 '23 at 11:20

Onnx batch prediction slower than sequential prediction

0 Answers0