Theano: Explicit transfers of data between GPU dosen't work

Question

I'm trying to implement a simple sample to show how to calculate two theano.tensor.dot in two different GPUs. Where the two dot shares the same A and different B.

theano.tensor.dot(A,B0); theano.tensor.dot(A,B1)

I'm willing to store the B0 and B1 in different GPUs. And A was originally stored in one GPU, and then I made a copy to another GPU with explicit transfer function. Finally, I dot separately in the two GPUs.

My implementation is as below:

import numpy
import theano
va0 = theano.shared(numpy.random.random((1024, 1024)).astype('float32'),
                    target='dev0')
va1 = va0.transfer('dev1')
vb0 = theano.shared(numpy.random.random((1024, 512)).astype('float32'),
                    target='dev0')
vb1 = theano.shared(numpy.random.random((1024, 2048)).astype('float32'),
                    target='dev1')
vc0 = theano.tensor.dot(va0,vb0)
vc1 = theano.tensor.dot(va1,vb1)

f = theano.function([], [vc1,vc0])
print f()

While I was looking into the nvprof result, I found that the two dot still run in the same GPU. And the va0.tranfer('dev1') doesn't work. Actually it copied vb1 into dev0 instead, and both the dots computed in dev0.

I tried sever combinations of Theano Flags but doesn't work. Any one can help?

nvprof: two dot in same gpu

score 1 · Answer 1 · answered Apr 27 '16 at 04:17

The Theano Flag below solves the issue.

export THEANO_FLAGS="contexts=dev0->cuda0;dev1->cuda1,optimizer_verbose=True,optimizer_excluding=local_cut_gpua_host_gpua"

optimizer_verbose provides the optimization done by theano function. I notice one line like below:

local_cut_gpu_transfers HostFromGpu(gpuarray).0 HostFromGpu(gpuarray).0

where local_cut_gpu_transfers is the reason HostFromGpu(gpuarray).0 is original node the last segment is what the original node be replaced to.

Then, I searched the keyword "local_cut_gpu_transfer" in source code of Theano, until I found:

optdb['canonicalize'].register('local_cut_gpua_host_gpua', local_cut_gpu_transfers, 'fast_compile', 'fast_run', 'gpuarray')

Then I add 'local_cut_gpua_host_gpua'to optimizer_excluding in Theano Flag. Hopes that Theano will provide a detailed map of the reason and the optimizer Theano Flag?

Theano: Explicit transfers of data between GPU dosen't work

1 Answers1