As you may have already seen, you do a transfer from host to device by using clEnqueueWriteBuffer
and similar.
All the commands having the keyword 'enqueue' in them have a special property: The commands are not executed directly, but when you tigger them using clFinish
, clFlush
, clEnqueueWaitForEvents
, using clEnqueueWriteBuffer
in blocking mode and some more.
This means that all action happens at once and you have to synchronise it using the event objects. As everything (may) happen at once, you could do something like this (Each point happens at the same time):
- Transfer Data A
- Process Data A & Transfer Data B
- Process Data B & Transfer Data C & Retrive Data A'
- Process Data C & Retrieve Data B'
- Retrieve Data C'
Remember: Enqueueing Tasks without Event-Objects may result in a simultaneous execution of all enqueued elements!
To make sure that Process Data B doesn't happen before Transfer B, you have to retrieve an event object from clEnqueueWriteBuffer
and supply it as an object to wait for to f.i. clEnqueueNDRangeKernel
cl_event evt;
clEnqueueWriteBuffer(... , bufferB , ... , ... , ... , bufferBdata , NULL , NULL , &evt);
clEnqueueNDRangeKernel(... , kernelB , ... , ... , ... , ... , 1 , &evt, NULL);
Instead of supplying NULL, each command can of course wait on certain objects AND generate a new event object. The parameter next to last is an array, so you can event wait for several events!
EDIT: To summarise the comments below
Transferring data - What command acts where?
CPU GPU
BufA BufB
array[] = {...}
clCreateBuffer() -----> [ ] //Create (empty) Buffer in GPU memory *
clCreateBuffer() -----> [ ] [ ] //Create (empty) Buffer in GPU memory *
clWriteBuffer() -arr-> [array] [ ] //Copy from CPU to GPU
clCopyBuffer() [array] -> [array] //Copy from GPU to GPU
clReadBuffer() <-arr- [array] [array] //Copy from GPU to CPU
* You may initialise the buffer directly by providing data using the host_ptr
parameter.