1

I am experimenting with Theano and specifically with the function scan.

I want to use it to apply a linear classifier to a set of feature vectors stored as columns of a matrix X (I am sure there are better ways to do it, this is just to get familiar with the function scan).

This is my code snippet:

T_W = T.fmatrix('W')
T_b = T.fmatrix('b')

T_X = T.fmatrix('X')
T_x = T.fmatrix('x')

# this is the linear classifier
T_f = T.dot(T_W, T_x) + T_b
f = theano.function(inputs=[T_x, theano.Param(T_W), theano.Param(T_b)],outputs=T_f)

T_outputs, T_updates = theano.scan(fn=lambda x,W,b : T_f, sequences=[T_X], non_sequences=[T_W,T_b])

F = theano.function(inputs=[T_X, theano.Param(T_W), theano.Param(T_b)],outputs=T_outputs)

When executing the snippet from iPython I get the following error (triggered by the last instruction):

    MissingInputError: A variable that is an input to the graph was neither provided as an input to the function nor given a value. A chain of variables leading from this input to an output is [x, for{cpu,scan_fn}.0]. This chain may not be unique
Backtrace when the variable is created:
  File "<ipython-input-40-72b539c54ff4>", line 5, in <module>
    T_x = T.fmatrix('x')
Remi Guan
  • 21,506
  • 17
  • 64
  • 87
mzs
  • 43
  • 6
  • @KevinGuan: I am new to StackOverflow policies: can you explain why you edited the question removing "Can somebody help me find out what am I doing wrong?" – mzs Oct 27 '15 at 04:42
  • Because SO is a Q&A site. Everyone ask question here because they have problems. So you don't need say something such as I removed. Again: *Everybody knows that you're doing wrong, so you don't need say that in your question.* :) – Remi Guan Oct 27 '15 at 04:46
  • Also don't need say "thanks" in question, and you didn't do that so this is just a tip. – Remi Guan Oct 27 '15 at 04:48

1 Answers1

1

It's not entirely clear what you are trying to do here but my guess is that you're implementing two different versions of a linear classifier, one that does not use scan and another that does.

The code below demonstrates my approach to doing this.

To answer your specific question:

The error message is appearing because your scan version uses T_f in the scan step function (this is strange and one reason it's not clear what you're trying to do; the step function is not using any of its input variables x, W, or b at all!) and T_f uses T_x but your scan version's function does not take T_x as input. Instead it takes T_X (note the case difference) which is not then used at all.

Here's a few hints and explanations for the differences between your code and mine.

  1. It helps immensely to keep things separated into discrete methods. By splitting the code into the v1 and v2 methods, we ensure that the two different implementations don't interfere with one another.

  2. Using the strict parameter of theano.scan is recommended at all times. It ensures you don't accidentally introduce an error caused by naming conflicts in the step function's parameters. It isn't enabled by default because that could break older code, when strict didn't exist.

  3. Use a fully fledged function instead of a lambda for scan's step function. Like strict mode, this helps avoid accidental naming conflicts, and makes the step code easier to follow. The step function can also be tested in isolation.

  4. Use compute_test_value to ensure the computation works with simple sample data. In particular, this will identify shape mismatches (e.g. doing a dot with the parameters in the wrong order), and makes debugging easier by being able to print/explore the intermediate values while the computation graph is being constructing instead of later when the computation is executed.

  5. This code has each input sample encoded as a row of x instead of as a column of x. This requires post-multiplying by w instead of pre-multiplying. It's possible to do it either way but pre-multiplying by w will make the addition with b a bit messier (a dimshuffle will need to be introduced).

  6. There's no need to use theano.Param unless you need to use non-standard behaviour with respect to default values, etc.

  7. Avoid naming things such that they differ only in case! In general, stick to the Python style guide (i.e. instance variables should be lower case with words separated with underscores).

  8. A dimshuffle and a selection of the first row are needed in the scan version's step function to ensure the dot product and subsequence addition of the bias are dimension compatible. This is not needed in the non-scan version because there we are doing a matrix-matrix dot-product.

The code:

import numpy
import theano
import theano.tensor as T


def create_inputs(x_value, w_value, b_value):
    x, w = T.matrices(2)
    b = T.vector()
    x.tag.test_value = x_value
    w.tag.test_value = w_value
    b.tag.test_value = b_value
    return x, w, b


def v1(x_value, w_value, b_value):
    x, w, b = create_inputs(x_value, w_value, b_value)
    y = T.dot(x, w) + b
    f = theano.function(inputs=[x, w, b], outputs=y)
    print f(x_value, w_value, b_value)


def v2_step(x, w, b):
    return (T.dot(x.dimshuffle('x', 0), w) + b)[0]


def v2(x_value, w_value, b_value):
    x, w, b = create_inputs(x_value, w_value, b_value)
    y, _ = theano.scan(v2_step, sequences=[x], non_sequences=[w, b], strict=True)
    f = theano.function(inputs=[x, w, b], outputs=y)
    print f(x_value, w_value, b_value)


def main():
    batch_size = 2
    input_size = 3
    hidden_size = 4
    theano.config.compute_test_value = 'raise'
    numpy.random.seed(1)
    x_value = numpy.random.standard_normal(size=(batch_size, input_size))
    w_value = numpy.random.standard_normal(size=(input_size, hidden_size))
    b_value = numpy.zeros((hidden_size,))
    v1(x_value, w_value, b_value)
    v2(x_value, w_value, b_value)


main()
Daniel Renshaw
  • 33,729
  • 8
  • 75
  • 94
  • Thanks a lot for your answers @DanielRenshaw: your points clarify several of my doubts. – mzs Oct 27 '15 at 04:41