System information
- Have I written custom code: No
- OS Platform and Distribution: Windows 10 version 21H2 (Windows-10-10.0.19041)
- TensorFlow installed from (source or binary): source
- TensorFlow version: tag 2.7.1 - https://github.com/tensorflow/tensorflow/tree/v2.7.1
- Python version: 3.6.8
- Bazel version: 3.7.2
- GCC/Compiler version: MSVC 2019 (VC\Tools\MSVC 14.29.30133)
- CUDA/cuDNN version: CUDA 11.2
- GPU model and memory: NVIDIA GeForce RTX 2060
Additional information
- Use TensorFlow in a C# application with Nuget package TensorFlow.NET as C# API - https://github.com/SciSharp/TensorFlow.NET
- Usage of tensorflow.dll — build from source — as redistribuable
- C# application loads 3 learned models for 3 different usages
- Development with Microsoft Visual Studio 2019
Describe the problem
As described previously, I'm creating a C# software using TensorFlow with the API Nuget package TensorFlow.NET. My software is not used to perform learning of TensorFlow models but only predictions.
The issue appears as a random one, in around 80% of execution times. So, as explained, my application loads 3 models with for each one:
- Load application settings
- Load the model as Graph
- Define a ConfigProto with
- AllowSoftPlacement = true
- GPU options with 'AllowGrowth' set to true
- DeviceCount with number of GPU set to 1
- Create a Session for the predictions with the graph and the configuration previously defined
- Run a first run with an input (vector or image) set to zero
With this procedure, the application perfectly loads the 2 first models but sometimes (most of the time), an issue appears when loading the third model. I tried to change the order of models, and nothing changes. The application crashes without any exception window.
After the software crashes, the outputs of Microsoft Visual Studio, I have this TensorFlow fatal error:
Check failed: buf_ null buf_ with non-zero shape size 1352238274176
The Event Viewer of Windows logs an error in ucrtbase.dll with the exception 0xc0000409. This issue is not well documented, I only found some useful links:
- https://github.com/tensorflow/tensorflow/issues/22174
- https://answers.microsoft.com/en-us/windows/forum/all/mfs2020-ucrtbasedll-error-exception-code/dce72f84-e659-424b-a135-df2d7a8b5d5a?auth=1
- https://github.com/Chatterino/chatterino2/issues/2774
- https://forum.image.sc/t/gpu-crashing-when-running-training-video-analyzation/37577
As suggested by Microsoft (third link), I tried to disable hardware acceleration but nothing changed.
Source code / logs
Last lines of outputs in Visual Studio after the software crashes
2022-03-08 16:33:07.860893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3915 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
2022-03-08 16:33:07.877858: F tensorflow/core/framework/tensor.cc:1030] Check failed: buf_ null buf_ with non-zero shape size 1352238274176
The program '[19948] MySoftware.exe' has exited with code -1 (0xffffffff).
Exception in Event Viewer of Windows
Faulting application name: MySoftware.exe, version : 1.0.0.0, time stamp: 0xf9488276
Faulting module name: ucrtbase.dll, version : 10.0.19041.789, time stamp: 0x2bd748bf
Exception code: 0xc0000409
Fault offset: 0x000000000007286e
Faulting process id: 0x4dec
Faulting application start time: 0x01d83301c1644776
Faulting application path: C:\dev\MySoftware\bin\x64\RelWithDebInfo\MySoftware.exe
Faulting module path: C:\WINDOWS\System32\ucrtbase.dll
Report Id: 07e8d2a5-8a04-4c59-bb08-59e9753b7d32
Faulting package full name:
Faulting package-relative application ID:
I have no ideas to solve this issue, could you please help me? Maybe it is an issue with version compatibility or something with CUDA/cuDNN. I really don't know especially because sometimes the software works perfectly for the 3 models.
Thank you