Edited:
I have made an application to with Cuda for an Image Descriptor... I tested it to a image and all went well... After that I wanted to try it to many images (ie 10000) to test the the time difference between serial and parallel. So I put my code to a for loop (not the allocations and memory frees). The problem occurred here... When I tried that, it seems that the application and more specifically functions cudaDeviceSynchronize() crash with error code 6. Now, I made a search and I found that on the one hand this error may be because of the kernel that is called before of the cudaDeviceSynchronize() as kernels are asynchronous and the truth is that usually only one specific cudaDeviceSynchronize() function crashes that is called after a specific kernel. One or two times crashes the next one that is again after another kernel. So can the problem be the kernel?I doubt it because it runs perfectly at 10 or 100 images with Nsight memory check on.
On the other hand I found out something about a watchdog timer... When I disabled it with the Nsight and run my program again pc froze completely. So I imagine that for some reason some of the threads or all of them don't end their operation so cudaDeviceSynchronize() doesn't allow the program to continue and after 2 sec(default) watchdog timer stops the application. Any ideas about why kernel lags? Could it be because of my device? I am using GeForce G 103M with Capability 1.1, Cuda version 4.2, windows 7 and visual studio 2010.
Thanks a lot!