In the deep learning tutorials, all training data is stored in a shared
array and only an index into that array is passed to the training function to slice out a minibatch.
I understand that this allows the data to be left in GPU memory, as opposed to passing small chunks of data as a parameter to the training function for each minibatch.
In some previous questions, this was given as an answer as to why the givens
mechanism is used in the tutorials.
I don't yet see the connection between these two concepts, so I'm probably missing out on something essential. As far as I understand, the givens mechanism swaps out a variable in the graph with a given symbolic expression (i.e., some given subgraph is inserted in place of that variable). Then why not define the computational graph the way we need it in the first place?
Here is a minimal example. I define a shared variable X
and an integer index
, and I either create a graph that already contains the slicing operation, or I create one where the slicing operation is inserted post-hoc via givens
.
By all appearances, the two resulting functions get_nogivens
and get_tutorial
are identical (see the debugprints at the end).
But then why do the tutorials use the givens
pattern?
import numpy as np
import theano
import theano.tensor as T
X = theano.shared(np.arange(100),borrow=True,name='X')
index = T.scalar(dtype='int32',name='index')
X_slice = X[index:index+5]
get_tutorial = theano.function([index], X, givens={X: X[index:index+5]}, mode='DebugMode')
get_nogivens = theano.function([index], X_slice, mode='DebugMode')
> theano.printing.debugprint(get_tutorial)
DeepCopyOp [@A] '' 4
|Subtensor{int32:int32:} [@B] '' 3
|X [@C]
|ScalarFromTensor [@D] '' 0
| |index [@E]
|ScalarFromTensor [@F] '' 2
|Elemwise{add,no_inplace} [@G] '' 1
|TensorConstant{5} [@H]
|index [@E]
> theano.printing.debugprint(get_nogivens)
DeepCopyOp [@A] '' 4
|Subtensor{int32:int32:} [@B] '' 3
|X [@C]
|ScalarFromTensor [@D] '' 0
| |index [@E]
|ScalarFromTensor [@F] '' 2
|Elemwise{add,no_inplace} [@G] '' 1
|TensorConstant{5} [@H]
|index [@E]