I have an ENet model that performs an image segmentation. I trained the model in Tensorflow, converted it to .onnx and I'm running an GPU inference with CUDA and OnnxRuntime in C# .NET6 win application. I would like to predict 16 images (512x512x3) at once. The performance of sequential inference of all images is way faster (1.5 second) then predicting one large vector with all images (3.5 sec). I'm out of ideas why this could be the case...
Snippets: This call is slower
var tensor = new DenseTensor<float>(data, new[] { 16, 3, 512, 512 });
var inputs = new List<NamedOnnxValue>()
{
NamedOnnxValue.CreateFromTensor(INPUT_COLUMN_NAME, tensor)
};
return _session.Run(inputs).ElementAt(0).AsTensor<float>().ToArray();
then 16 consecutive calls of this
var tensor = new DenseTensor<float>(data, new[] { 1, 3, 512, 512 });
var inputs = new List<NamedOnnxValue>()
{
NamedOnnxValue.CreateFromTensor(INPUT_COLUMN_NAME, tensor)
};
return _session.Run(inputs).ElementAt(0).AsTensor<float>().ToArray();