I'm writing a convnet using torch and cudnn and having some memory issues. I tried debugging the script with cuda-memcheck only to notice it actually runs when fed through cuda-memcheck (albeit slower than it should).
Turns out if cuda-memcheck is running in the background, a separate instantiation of the script itself also runs fine.
Any idea what might be happening here?