cudaMemCpy2d error (cudaErrorInvalidValue) when running "debug" configuration

Question

This is driving me crazy. I can't figure out for the life of me why this is happening. Basically, I have this code that was working totally fine under Linux (Nsight eclipse edition). I tried making it compatible with Windows by creating a Visual Studio 2013 project and configuring it.

At this point everything seems to be fine, the code compiles without any problems. It even runs fine when I use the "Release" configuration. However, as soon as I try the Debug configuration, the portion below crashes with a cudaErrorInvalidValue error.
I've tracked down the problem to the optimization flag. Disabling optimization will result in a crash. Using /O2 or /O1, the code runs fine!

Again, this works just fine under Linux with or without optimization. I wonder what gives in Windows optimization. If it's of any help, I'm using Visual Studio 2013 (Update 4) with CUDA 6.5 and static library linking. (On Linux it was CUDA 6.5 but dynamic library linking).

The whole code is available here.

size_t hostPitch = (size_t)getHostPitch();
size_t devicePitch = (size_t)getDevicePitch();
size_t cal = (size_t)(width * numChannels * sizeof(T));
size_t h = (size_t)height;
cudaError_t eCUDAResult = cudaMemcpy2D((void*)this->hostData, hostPitch, (const void*)this->deviceData, devicePitch, cal, h, cudaMemcpyDeviceToHost);

SO [expects](http://stackoverflow.com/help/on-topic) an [MCVE](http://stackoverflow.com/help/mcve) in the question itself. What you've linked to ("whole code") isn't even a complete code. It's a header file. — Robert Crovella, Dec 17 '14 at 05:12
@RobertCrovella But with this project it is very difficult to create a minimally working example :( Error only occurs on one certain kind of data... Any suggestions? — Maghoumi, Dec 17 '14 at 05:34
@RobertCrovella The "whole" code is a minimal library that computes HoG features on the GPU. The repository is already available. Also the header file is not a header file really. It's an implementation of a template class — Maghoumi, Dec 17 '14 at 05:35
It's a very simple proposition. You provide a short, fully standalone demo case in your question and you will probably get a helpful answer. Don't and you probably won't. Your choice (and I don't buy the "it's too complex/time consuming/large" argument. If you make the effort, you'll probably find the problem yourself, or you'll refine the problem down to something you'll get an answer to here) — talonmies, Dec 17 '14 at 07:40
Usually, when optimizations seem to break your program, it means you're invoking undefined behavior somewhere. Try putting a breakpoint before the faulty call to see what's going on. — user703016, Dec 17 '14 at 09:03

score 3 · Accepted Answer · answered Dec 19 '14 at 03:02

The comment by @Park Young-Bae solved my problem (though it took some more efforts than having a simple breakpoint!)
The undefined behavior was caused by my carelessness. In one of the classes, I had forgotten to override copy and assign. Therefore, when an object was being returned its destructor was called and was freeing all the CUDA memory! As a result, subsequent CUDA API calls on that object were working on dangling references.

Can't believe how easy it is to miss something tiny in C++ and spend hours on debugging

Surprised how the destuctor call was being optimized away by G++ and VC++'s release mode! — Maghoumi, Dec 19 '14 at 04:36

cudaMemCpy2d error (cudaErrorInvalidValue) when running "debug" configuration

1 Answers1