I'm trying to implement a simple sample to show how to calculate two theano.tensor.dot in two different GPUs. Where the two dot shares the same A and different B.
theano.tensor.dot(A,B0); theano.tensor.dot(A,B1)
I'm willing to store the B0 and B1 in different GPUs. And A was originally stored in one GPU, and then I made a copy to another GPU with explicit transfer function. Finally, I dot separately in the two GPUs.
My implementation is as below:
import numpy
import theano
va0 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
target='dev0')
va1 = va0.transfer('dev1')
vb0 = theano.shared(numpy.random.random((1024, 512)).astype('float32'),
target='dev0')
vb1 = theano.shared(numpy.random.random((1024, 2048)).astype('float32'),
target='dev1')
vc0 = theano.tensor.dot(va0,vb0)
vc1 = theano.tensor.dot(va1,vb1)
f = theano.function([], [vc1,vc0])
print f()
While I was looking into the nvprof result, I found that the two dot still run in the same GPU. And the va0.tranfer('dev1') doesn't work. Actually it copied vb1 into dev0 instead, and both the dots computed in dev0.
I tried sever combinations of Theano Flags but doesn't work. Any one can help?