2

I am writing a program that retrieves images from a camera and processes them with CUDA. In order to gain the best performance, I'm passing a CUDA unified memory buffer to the image acquisition library, which writes to the buffer in another thread.

This causes all sorts of weird results where to program hangs in library code that I do not have access to. If I use a normal memory buffer and then copy to CUDA, the problem is fixed. So I became suspicious that writing from another thread might not allowed, and googled as I did, I could not find a definitive answer.

So is accessing the unified memory buffer from another CPU thread is allowed or not?

Elektito
  • 3,863
  • 8
  • 42
  • 72

1 Answers1

5

There should be no problem writing to a unified memory buffer from multiple threads.

However, keep in mind the restrictions imposed when the concurrentManagedAccess device property is not true. In that case, when you have a managed buffer, and you launch a kernel, no CPU/host thread access of any kind is allowed, to that buffer, or any other managed buffer, until you perform a cudaDeviceSynchronize() after the kernel call.

In a multithreaded environment, this might take some explicit effort to enforce.

I think this is similar to this recital if that is also your posting. Note that TX2 should have this property set to false.

Note that this general rule in the non-concurrent case can be modified through careful use of streams. However the restrictions still apply to buffers attached to streams that have a kernel launched in them (or buffers not explicitly attached to any stream): when the property mentioned above is false, access by any CPU thread is not possible.

The motivation for this behavior is roughly as follows. The CUDA runtime does not know the relationship between managed buffers, regardless of where those buffers were created. A buffer created in one thread could easily have objects in it with embedded pointers, and there is nothing to prevent or restrict those pointers from pointing to data in another managed buffer. Even a buffer that was created later. Even a buffer that was created in another thread. The safe assumption is that any linkages could be possible, and therefore, without any other negotiation, the managed memory subsystem in the CUDA runtime must move all managed buffers to the GPU, when a kernel is launched. This makes all managed buffers, without exception, inaccessible to CPU threads (any thread, anywhere). In the normal program flow, access is restored at the next occurrence of a cudaDeviceSynchronize() call. Once the CPU thread that issues that call completes the call and moves on, then managed buffers are once again visible to (all) CPU threads. Another kernel launch (anywhere) repeats the process, and interrupts the accessibility. To repeat, this is the mechanism that is in effect when the concurrentManagedAccess property on the GPU is not true, and this behavior can be somewhat modified via the aforementioned stream attach mechanism.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
  • Yeah, that's my post. You got me! Correct me if i'm wrong. `cudaDeviceSynchronize` is only necessary when the GPU writes to the buffer, is that right? In my case, the camera thread has no GPU code, so my guess is this should be fine. – Elektito Feb 08 '19 at 15:27
  • I would disagree with that statement. Furthermore I don't think you are grasping the concept. A kernel launch causes all managed buffers to be inaccessible from any CPU thread. Regardless of what those kernels are doing or not doing or how they are launched or in what thread they are launched (subject to the modification I added to my answer, which I imagine does not apply here). Once a kernel is launched, this condition persists until a `cudaDeviceSynchronize()` call is made. – Robert Crovella Feb 08 '19 at 15:30
  • Yeah, you're right. I was confused! But still, I have `cudaDeviceSynchronize` calls in the thread that actually calls CUDA. And a buffer manager ensures that the same buffer is not passed to the camera thread while it's being processed by CUDA. This should be fine them, am I right? – Elektito Feb 08 '19 at 15:33
  • Any managed buffer, created anywhere, is subject to this restriction, once a kernel is launched anywhere. It sounds to me like you should investigate the stream attach mechanism I mention in my answer. This allows you to associate managed buffers with streams, and therefore control the migration of those buffers based on whether a kernel is launched in a particular stream. – Robert Crovella Feb 08 '19 at 15:38
  • So if a kernel is running in another thread, I cannot use _any_ managed buffer from _any_ thread? – Elektito Feb 08 '19 at 15:45
  • correct. Unless you use the stream attach mechanism – Robert Crovella Feb 08 '19 at 15:45
  • Okay then. I'm finally getting this. I guess this answers my question and then some. I should find a way using the mechanism you mentioned. Thank you very much for the help. – Elektito Feb 08 '19 at 15:47