It's not entirely clear what you are trying to do here but my guess is that you're implementing two different versions of a linear classifier, one that does not use scan and another that does.
The code below demonstrates my approach to doing this.
To answer your specific question:
The error message is appearing because your scan version uses T_f
in the scan step function (this is strange and one reason it's not clear what you're trying to do; the step function is not using any of its input variables x
, W
, or b
at all!) and T_f
uses T_x
but your scan version's function does not take T_x
as input. Instead it takes T_X
(note the case difference) which is not then used at all.
Here's a few hints and explanations for the differences between your code and mine.
It helps immensely to keep things separated into discrete methods. By splitting the code into the v1
and v2
methods, we ensure that the two different implementations don't interfere with one another.
Using the strict
parameter of theano.scan
is recommended at all times. It ensures you don't accidentally introduce an error caused by naming conflicts in the step function's parameters. It isn't enabled by default because that could break older code, when strict didn't exist.
Use a fully fledged function instead of a lambda for scan's step function. Like strict mode, this helps avoid accidental naming conflicts, and makes the step code easier to follow. The step function can also be tested in isolation.
Use compute_test_value
to ensure the computation works with simple sample data. In particular, this will identify shape mismatches (e.g. doing a dot
with the parameters in the wrong order), and makes debugging easier by being able to print/explore the intermediate values while the computation graph is being constructing instead of later when the computation is executed.
This code has each input sample encoded as a row of x
instead of as a column of x
. This requires post-multiplying by w
instead of pre-multiplying. It's possible to do it either way but pre-multiplying by w
will make the addition with b
a bit messier (a dimshuffle
will need to be introduced).
There's no need to use theano.Param
unless you need to use non-standard behaviour with respect to default values, etc.
Avoid naming things such that they differ only in case! In general, stick to the Python style guide (i.e. instance variables should be lower case with words separated with underscores).
A dimshuffle and a selection of the first row are needed in the scan version's step function to ensure the dot product and subsequence addition of the bias are dimension compatible. This is not needed in the non-scan version because there we are doing a matrix-matrix dot-product.
The code:
import numpy
import theano
import theano.tensor as T
def create_inputs(x_value, w_value, b_value):
x, w = T.matrices(2)
b = T.vector()
x.tag.test_value = x_value
w.tag.test_value = w_value
b.tag.test_value = b_value
return x, w, b
def v1(x_value, w_value, b_value):
x, w, b = create_inputs(x_value, w_value, b_value)
y = T.dot(x, w) + b
f = theano.function(inputs=[x, w, b], outputs=y)
print f(x_value, w_value, b_value)
def v2_step(x, w, b):
return (T.dot(x.dimshuffle('x', 0), w) + b)[0]
def v2(x_value, w_value, b_value):
x, w, b = create_inputs(x_value, w_value, b_value)
y, _ = theano.scan(v2_step, sequences=[x], non_sequences=[w, b], strict=True)
f = theano.function(inputs=[x, w, b], outputs=y)
print f(x_value, w_value, b_value)
def main():
batch_size = 2
input_size = 3
hidden_size = 4
theano.config.compute_test_value = 'raise'
numpy.random.seed(1)
x_value = numpy.random.standard_normal(size=(batch_size, input_size))
w_value = numpy.random.standard_normal(size=(input_size, hidden_size))
b_value = numpy.zeros((hidden_size,))
v1(x_value, w_value, b_value)
v2(x_value, w_value, b_value)
main()