I am trying to run a ONNX model in C# created with pytorch in Python for image segmentation. Everything works fine when I run it on CPU but when I try to use the GPU my application crash when trying to run the inference. (Everything works fine when doing the inference in python with GPU)
The only thing I have is an event in the Windows 10 Event Viewer :
Faulting application name: DeepLearningONNX.exe, version: 1.0.0.0, time stamp: 0x6331eb0e Faulting module name: cudnn64_8.dll, version: 6.14.11.6050, time stamp: 0x62e9c226 Exception code: 0xc0000409 Fault offset: 0x000000000001420d Faulting process id: 0x2cc0 Faulting application start time: 0x01d8f830aac6f0a2 Faulting application path: C:\R&D\DeepLearningONNX\DeepLearningONNX\bin\x64\Debug\net6.0-windows\DeepLearningONNX.exe Faulting module path: C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin\cudnn64_8.dll Report Id: 40803e1a-e84d-4645-bfb6-4ebbb6ba1b78 Faulting package full name: Faulting package-relative application ID:
My Hardware :
NVIDIA Quadro P620 (4GB). Driver 31.0.15.1740
Intel Core i7-10850H
Windows 10 22H2 OS build 19045.2251
In my Environment system variables :
CUDA_PATH : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6
CUDA_PATH_V11_6 : C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6
PATH : C:\Program Files\NVIDIA\CUDNN\v8.5;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin;C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\libnvvp
In my C# (.NET 6) solution. The nuget installed :
Microsoft.ML.OnnxRuntime.Gpu version 1.13.1
Softwares installed :
Visual Studio Community 2022 (64bit) version 17.3.6
cuda_11.6.2_511.65_windows.exe
cudnn-windows-x86_64-8.5.0.96_cuda11-archive extracted in C:\Program Files\NVIDIA\CUDNN\v8.5
My code C# :
private void InferenceDebug(string modelPath, bool useGPU)
{
InferenceSession session;
if (useGPU)
{
var cudaProviderOptions = new OrtCUDAProviderOptions();
var providerOptionsDict = new Dictionary<string, string>();
providerOptionsDict["device_id"] = "0";
providerOptionsDict["gpu_mem_limit"] = "2147483648";
providerOptionsDict["arena_extend_strategy"] = "kSameAsRequested";
providerOptionsDict["cudnn_conv_algo_search"] = "DEFAULT";
providerOptionsDict["do_copy_in_default_stream"] = "1";
providerOptionsDict["cudnn_conv_use_max_workspace"] = "1";
providerOptionsDict["cudnn_conv1d_pad_to_nc1d"] = "1";
cudaProviderOptions.UpdateOptions(providerOptionsDict);
SessionOptions options = SessionOptions.MakeSessionOptionWithCudaProvider(cudaProviderOptions);
session = new InferenceSession(modelPath, options);
}
else
session = new InferenceSession(modelPath);
int w = 128;
int h = 128;
Tensor<float> input = new DenseTensor<float>(new int[] { 1, 3, h, w });
Random random = new Random(42);
for (int y = 0; y < h; y++)
{
for (int x = 0; x < w; x++)
{
input[0, 0, y, x] = (float)(random.NextDouble() / 255);
input[0, 1, y, x] = (float)(random.NextDouble() / 255);
input[0, 2, y, x] = (float)(random.NextDouble() / 255);
}
}
var inputs = new List<NamedOnnxValue> { NamedOnnxValue.CreateFromTensor<float>("modelInput", input) };
using IDisposableReadOnlyCollection<DisposableNamedOnnxValue> results = session.Run(inputs); // The crash is when executing this line
}
My Code Python (3.10 64bit) :
import torch # version '1.12.1+cu116'
from torch import nn
import segmentation_models_pytorch as smp
from segmentation_models_pytorch.losses import DiceLoss
class SegmentationModel(nn.Module):
def __init__(self):
super(SegmentationModel, self).__init__()
self.arc = smp.UnetPlusPlus(encoder_name= 'timm-efficientnet-b0',
encoder_weights='imagenet',
in_channels= 3,
classes = 1,
activation=None)
def forward(self,images, masks=None):
logits = self.arc(images)
if masks != None :
loss1 =DiceLoss(mode='binary')(logits, masks)
loss2 = nn.BCEWithLogitsLoss()(logits, masks)
return logits, loss1+loss2
return logits
modelPath = "D:/model.pt"
device = "cuda"#input("Enter device (cpu or cuda) : ")
model = SegmentationModel()
model.to(device);
model.load_state_dict(torch.load(modelPath,map_location=torch.device(device) ))
model.eval()
dummy_input = torch.randn(1,3,128,128,device=device)
torch.onnx.export(model, # model being run
dummy_input, # model input (or a tuple for multiple inputs)
"model.onnx", # where to save the model
export_params=True, # store the trained parameter weights inside the model file
do_constant_folding=True, # whether to execute constant folding for optimization
input_names = ['modelInput'], # the model's input names
output_names = ['modelOutput'], # the model's output names
dynamic_axes={'modelInput' : [0,2,3], # variable length axes
'modelOutput' : [0,2,3]})
What is the cause of the crash and how can I fix it?