2

System information

  • Have I written custom code: No
  • OS Platform and Distribution: Windows 10 version 21H2 (Windows-10-10.0.19041)
  • TensorFlow installed from (source or binary): source
  • TensorFlow version: tag 2.7.1 - https://github.com/tensorflow/tensorflow/tree/v2.7.1
  • Python version: 3.6.8
  • Bazel version: 3.7.2
  • GCC/Compiler version: MSVC 2019 (VC\Tools\MSVC 14.29.30133)
  • CUDA/cuDNN version: CUDA 11.2
  • GPU model and memory: NVIDIA GeForce RTX 2060

Additional information

  • Use TensorFlow in a C# application with Nuget package TensorFlow.NET as C# API - https://github.com/SciSharp/TensorFlow.NET
  • Usage of tensorflow.dll — build from source — as redistribuable
  • C# application loads 3 learned models for 3 different usages
  • Development with Microsoft Visual Studio 2019

Describe the problem

As described previously, I'm creating a C# software using TensorFlow with the API Nuget package TensorFlow.NET. My software is not used to perform learning of TensorFlow models but only predictions.

The issue appears as a random one, in around 80% of execution times. So, as explained, my application loads 3 models with for each one:

  • Load application settings
  • Load the model as Graph
  • Define a ConfigProto with
    • AllowSoftPlacement = true
    • GPU options with 'AllowGrowth' set to true
    • DeviceCount with number of GPU set to 1
  • Create a Session for the predictions with the graph and the configuration previously defined
  • Run a first run with an input (vector or image) set to zero

With this procedure, the application perfectly loads the 2 first models but sometimes (most of the time), an issue appears when loading the third model. I tried to change the order of models, and nothing changes. The application crashes without any exception window.

After the software crashes, the outputs of Microsoft Visual Studio, I have this TensorFlow fatal error:

Check failed: buf_ null buf_ with non-zero shape size 1352238274176

The Event Viewer of Windows logs an error in ucrtbase.dll with the exception 0xc0000409. This issue is not well documented, I only found some useful links:

As suggested by Microsoft (third link), I tried to disable hardware acceleration but nothing changed.

Source code / logs

Last lines of outputs in Visual Studio after the software crashes

2022-03-08 16:33:07.860893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 3915 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 2060, pci bus id: 0000:01:00.0, compute capability: 7.5
2022-03-08 16:33:07.877858: F tensorflow/core/framework/tensor.cc:1030] Check failed: buf_ null buf_ with non-zero shape size 1352238274176
The program '[19948] MySoftware.exe' has exited with code -1 (0xffffffff).

Exception in Event Viewer of Windows

Faulting application name: MySoftware.exe, version : 1.0.0.0, time stamp: 0xf9488276
Faulting module name: ucrtbase.dll, version : 10.0.19041.789, time stamp: 0x2bd748bf
Exception code: 0xc0000409
Fault offset: 0x000000000007286e
Faulting process id: 0x4dec
Faulting application start time: 0x01d83301c1644776
Faulting application path: C:\dev\MySoftware\bin\x64\RelWithDebInfo\MySoftware.exe
Faulting module path: C:\WINDOWS\System32\ucrtbase.dll
Report Id: 07e8d2a5-8a04-4c59-bb08-59e9753b7d32
Faulting package full name:
Faulting package-relative application ID:

I have no ideas to solve this issue, could you please help me? Maybe it is an issue with version compatibility or something with CUDA/cuDNN. I really don't know especially because sometimes the software works perfectly for the 3 models.

Thank you

quintakFR
  • 21
  • 3

0 Answers0