1

I was wondering if it was possible to implement a recurrent network in Theano in the case where inputs are not known initially. Specifically, I have in mind the 'Recurrent Models of Visual Attention' paper (http://arxiv.org/abs/1406.6247) and the part concerning game playing. In this case, each game image is only available after the network has outputted an action.

As I understand, RNN's in Theano are implemented using theano.scan function, which expects a sequence as an input. Obviously, I can't produce such a sequence of game images without running the full recurrent loop and recording the actions that would be generated. And I can't run the loop and generate the sequence of actions, since I don't have the sequence of game images to pass as an input.

So, it would seem that under those conditions I can't use a proper back propagation and train the network correctly. I could run each iteration of the loop manually, but then there would be no BPTT.

What am I missing here? Is it possible to implement the algorithm in the paper describing the game playing part in Theano (I've seen implementations of digit classification part, but it's easier, since the input never changes)?

Thanks.

1 Answers1

1

If I understand it correctly (assuming you're referring to the simple game of "catch"), I don't see any problem.

There's an initial state for the game (i.e. the initial position of the paddle and the initial position of the ball) that can be provided to the network in the first time step. The network predicts an action to be performed and the game state is updated based on the chosen action. The updated game state is then provided as input to the network in the second time step.

Update

Here's some sample code showing how to use the output of an earlier time step to update a state within a theano.scan operation.

import theano
import theano.tensor as tt


def choose_action(s):
    # TODO: Given the game state s, choose which action to perform
    return s.argmax()


def update_state(s, y):
    # TODO: Update game state s given action y
    return s + y


def is_end_state(s):
    # TODO: Determine whether game state s is an end-game state
    return s.max() > 100


def step(s_tm1):
    y_tm1 = choose_action(s_tm1)
    s_t = update_state(s_tm1, y_tm1)
    return (y_tm1, s_t), theano.scan_module.until(is_end_state(s_t))


def main():
    theano.config.compute_test_value = 'raise'
    initial_state = tt.matrix()
    initial_state.tag.test_value = [[0, 2, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 1]]
    (action_sequence, state_sequence), _ = theano.scan(step, outputs_info=[None, initial_state], n_steps=1000)
    state_sequence = tt.concatenate([[initial_state], state_sequence])
    f = theano.function([initial_state], outputs=[action_sequence, state_sequence])


main()
Daniel Renshaw
  • 33,729
  • 8
  • 75
  • 94
  • theano.scan requires full sequence of game states to be provided to apply BPTT correctly. Running each iteration of the loop manually would only provide gradients for the last iteration, which also means that most of the gradients would be 0, since the reward for most iterations is 0. – user2971693 Sep 28 '15 at 14:48
  • I've updated my answer. The key point is that you don't need to pass a sequence to `scan`. You can pass just an initial state and have it iterate either a fixed number of times (`n_steps`) or stop when some condition is met (`theano.scan_module.until`). – Daniel Renshaw Sep 28 '15 at 15:10
  • As I understood from playing around with Theano, `step` callback is not called anymore after the function is compiled - if you put a `print` statement in `step` and invoke `f`, you won't see 1000 printouts. So, there is no way to update the state inside the loop. – user2971693 Sep 28 '15 at 15:42
  • 1
    That's a common misconception about scan. The scan step function tells Theano how to iterate a single iteration symbolically but Theano will execute that step computation as many times as is required. If you put a `print` in a scan step function you'll only see it once, during compilation, but none at all during execution. If you put a `theano.printing.Print('op')(value)` operation inside a step function then you'll see that as many times as there are iterations. – Daniel Renshaw Sep 28 '15 at 15:54
  • `Theano will execute that step computation as many times as is required`. Theano only executes its own code not arbitrary Python code. Getting a game image from, for example, an external emulator will not work in each iteration. On other hand, writing a game in pure Theano so that given an action a corresponding state could be computed seems very challenging. – user2971693 Sep 28 '15 at 16:05
  • Yes, using the approach I described, the game itself needs to be differentiable. – Daniel Renshaw Sep 28 '15 at 21:18