I am trying to run a pycuda program across two gpus. I have read a great post by Talonmies explaining how you do it with the threading library, the post also mentioned this is possible with mpi4py.
When I run mpi4py with pycuda, program gives the error: self.ctx = driver.Device(gpuid).max_context pycuda._driver.logicError: cuDeviceGet failed: not initialized
Perhaps this is due to my attempt to initalize two of the gpu devices simutanously. Does anyone have a very short example of how we can get 2 gpus working with mpi4py?