My program works fine on my standard Ubuntu x64 box, but if I run under valgrind
I see the following error:
==22246== Conditional jump or move depends on uninitialised value(s)
==22246== at 0x9854DCBC: ??? (in /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21)
==22246== by 0x98182940: cudnnGetConvolutionBackwardFilterWorkspaceSize (in /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21)
==22246== by 0x982C3787: ??? (in /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21)
==22246== by 0x9817FC0F: cudnnGetConvolutionBackwardFilterAlgorithm (in /usr/lib/x86_64-linux-gnu/libcudnn.so.6.0.21)
==22246== by 0x903F4908: caffe::CuDNNConvolutionLayer<float>::Reshape(std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&, std::vector<caffe::Blob<float>*, std::allocator<caffe::Blob<float>*> > const&) (cudnn_conv_layer.cpp:149)
==22246== by 0x904F35D8: SetUp (layer.hpp:72)
==22246== by 0x904F35D8: caffe::Net<float>::Init(caffe::NetParameter const&) (net.cpp:148)
==22246== by 0x904F523F: caffe::Net<float>::Net(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, caffe::Phase, int, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const*, caffe::Net<float> const*) (net.cpp:45)
--from here is my own code
Unfortunately, although my code is pretty much a bog-standard interface to caffe
, the model is a complex proprietary model I cannot share here. Furthermore, CuDNN is closed-source, so I cannot debug that to see if this is a problem worth bothering about.
Googling cudnnGetConvolutionBackwardFilterWorkspaceSize valgrind
and cudnnGetConvolutionBackwardFilterAlgorithm valgrind
turns up nothing useful except a hint to add --track-origins=yes
, but when I add that the error goes away...
The problem I am actually trying to solve is that the Deep Learning module crashes in the standard library on the target platform with a call to freeing already-freed memory. However, the target is an ARM-based device that I cannot get access to for further investigation.