3

I am developing something in heterogeneous systems with CPU and GPU (AMD APU, in fact) with OpenCL. Since I will use atomic operations to guarantee the integrity of data, and the data is shared among CPU device and GPU device, on each of which there is a kernel running on the shared data. My question is: is atomic operation still valid between these two devices? Hope anyone can help me. Many thanks.

Paulo Freitas
  • 13,194
  • 14
  • 74
  • 96

1 Answers1

2

Appendix A of the OpenCL Specification covers the synchronization of memory objects between different devices. There is no guarantee both devices will access the memory objects at the same physical location: one of the devices may work on a copy of the buffer, and only synchronization as described in Appendix A will ensure the other devices gets a copy of it.

Your implementation on the AMD APU may allow both CPU and GPU to share the same address space, and may not require the inter device synchronization. I would suggest to check AMD documentations and experiment.

Eric Bainville
  • 9,738
  • 1
  • 25
  • 27
  • Hi, @Eric Bainville thanks for your kind reply. As I know the synchronization of memory objects between different devices using events does work in some cases such as: device A will do a series of operations on shared data, then device B will continue some operations on it, such kind of synchronization can be achieved by event concept among multiple devices. But my problem is both CPU and GPU will frequently access the shared data alternately at the same time period. So in this case I think the event may not work except some mechanisms such as atomic operation. Any good suggestions for it? –  Sep 26 '12 at 05:55
  • As I said, the AMD driver may support this; check the docs and experiment to see what happens. Otherwise, it depends on how frequently you have to synchronize both devices (it does not need to synchronize the host thread, only the devices). If you can do a significant work between two syncs, then it may be OK. If it's not, you may need to change your algorithm to reduce the sharing of the buffers between the devices. – Eric Bainville Sep 26 '12 at 16:13