I am using Apple's Accelerate Framework, and namely vDSP, in order to perform several subsequent matrix & vector operations.
When does the CPU gather/copy the memory from the GPU?
Does it happen after every vDSP function call?
If not, is there a way to 'force' the gathering operation explicitly?