Buidling matrix using scan within theano

Question

I'm pretty certain this is trivial, but I haven't yet managed to quite get my head around scan. I want to iteratively build a matrix of values, m, where

m[i,j] = f(m[k,l]) for k < i, j < l

so you could think of it as a dynamic programming problem. However, I can't even generate the list [1..100] by iterating over the list [1..100] and updating the shared value as I go.

import numpy as np
import theano as T
import theano.tensor as TT

def test():
    arr = T.shared(np.zeros(100))
    def grid(idx, arr):
        return {arr: TT.set_subtensor(arr[idx], idx)}

    T.scan(
        grid,
        sequences=TT.arange(100),
        non_sequences=[arr])

    return arr

run = T.function([], outputs=test())
run()

which returns

array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
    0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
    0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
    0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
    0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
    0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
    0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
    0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])

Daniel Renshaw · Answer 1 · 2015-10-29T20:59:58.110

There's a few things here that point towards some misunderstandings. scan really can be a hard bit of Theano to wrap your head around!

Here's some updated code that does what I think you're trying to do, but I wouldn't recommend using this code at all. The basic issue is that you seem to be using a shared variable inappropriately.

import numpy as np
import theano as T
import theano.tensor as TT

def test():
    arr = T.shared(np.zeros(100))
    def grid(idx, arr):
        return {arr: TT.set_subtensor(arr[idx], idx)}

    _, updates = T.scan(
        grid,
        sequences=TT.arange(100),
        non_sequences=[arr])

    return arr, updates

outputs, updates = test()
run = T.function([], outputs=outputs, updates=updates)
print run()
print outputs.get_value()

This code is changed from the original in two ways:

The updates from the scan have to be captured (originally discarded) and passed to the theano.function's updates parameters. Without this the shared variable won't be updated at all.
The contents of the shared variable need to be examined after the function is executed (see below).

This code prints two sets of values. The first is the output of the Theano function from when it's executed. The second is the contents of the shared variable after the Theano function has executed. The Theano function returns the shared variable so you might think that these two sets of values should be the same, but you'd be wrong! No shared variables are updated until after all of the function's output values have been computed. So it's only after the function has been executed and we look at the contents of the shared variable that we see the values we expected to see originally.

Here's an example of implementing a dynamic programming algorithm in Theano. The algorithm is a simplified version of dynamic time warping which has a lot of similarities to edit distance.

import numpy
import theano
import theano.tensor as tt


def inner_step(j, c_ijm1, i, c_im1, x, y):
    insert_cost = tt.switch(tt.eq(j, 0), numpy.inf, c_ijm1)
    delete_cost = tt.switch(tt.eq(i, 0), numpy.inf, c_im1[j])
    match_cost = tt.switch(tt.eq(i, 0), numpy.inf, c_im1[j - 1])
    in_top_left = tt.and_(tt.eq(i, 0), tt.eq(j, 0))
    min_c = tt.min(tt.stack([insert_cost, delete_cost, match_cost]))
    c_ij = tt.abs_(x[i] - y[j]) + tt.switch(in_top_left, 0., min_c)
    return c_ij


def outer_step(i, c_im1, x, y):
    outputs, _ = theano.scan(inner_step, sequences=[tt.arange(y.shape[0])],
                             outputs_info=[tt.constant(0, dtype=theano.config.floatX)],
                             non_sequences=[i, c_im1, x, y], strict=True)
    return outputs


def main():
    x = tt.vector()
    y = tt.vector()
    outputs, _ = theano.scan(outer_step, sequences=[tt.arange(x.shape[0])],
                             outputs_info=[tt.zeros_like(y)],
                             non_sequences=[x, y], strict=True)
    f = theano.function([x, y], outputs=outputs)
    a = numpy.array([1, 2, 4, 8], dtype=theano.config.floatX)
    b = numpy.array([2, 3, 4, 7, 8, 9], dtype=theano.config.floatX)
    print a
    print b
    print f(a, b)


main()

This is highly simplified and I wouldn't recommend using it for real. In general Theano is very bad at doing dynamic programming because theano.scan is so slow in comparison to native looping. If you need to propagate gradients through a dynamic program then you may not have any choice but if you don't need gradients you should probably avoid using Theano for dynamic programming.

If you want a much more thorough implementation of DTW which gets over some of the performance hits Theano imposes by computing many comparisons in parallel (i.e. batching) then take a look here: https://github.com/danielrenshaw/TheanoBatchDTW.

Buidling matrix using scan within theano

1 Answers1