1

I'm trying to get a better grasp of the scan functionality in theano, my understanding is that it behaves like a for loop based on this document. I've created a very simple working example to find the weight and bias when performing linear regression.

#### Libraries
# Third Party Libraries
import numpy as np
import theano
import theano.tensor as T

# not intended for mini-batch
def gen_data(num_points=50, slope=1, bias=10, x_max=50):
    f = lambda z: slope * z + bias
    x = np.zeros(shape=(num_points), dtype=theano.config.floatX)
    y = np.zeros(shape=(num_points), dtype=theano.config.floatX)

    for i in range(num_points):
        x_temp = np.random.uniform()*x_max
        x[i] = x_temp
        y[i] = f(x_temp) + np.random.normal(scale=3.0)

    return (x, y)

#############################################################
#############################################################
train_x, train_y = gen_data(num_points=50, slope=2, bias=5)
epochs = 50

# Declaring variable
learn_rate = T.scalar(name='learn_rate', dtype=theano.config.floatX)
x = T.vector(name='x', dtype=theano.config.floatX)
y = T.vector(name='y', dtype=theano.config.floatX)
# Variables that will be updated
theta = theano.shared(np.random.rand(), name='theta')
bias = theano.shared(np.random.rand(), name='bias')

hyp = T.dot(theta, x) + bias
cost = T.mean((hyp - y)**2)/2
f_cost = theano.function(inputs=[x, y], outputs=cost)

grad_t, grad_b = T.grad(cost, [theta, bias])

train = theano.function(inputs=[x, y, learn_rate], outputs=cost,
                        updates=((theta, theta-learn_rate*grad_t), 
                                 (bias, bias-learn_rate*grad_b)))

print('weight: {}, bias: {}'.format(theta.get_value(), bias.get_value()))

for i in range(epochs): # Try changing this to a `scan`
    train(train_x, train_y, 0.001)

print('------------------------------')
print('weight: {}, bias: {}'.format(theta.get_value(), bias.get_value()))

I would like to change that for loop to a theano.scan function, but every attempt I've made has yielded one error message after the next.

Lukasz
  • 2,476
  • 10
  • 41
  • 51

1 Answers1

1

In order to use theano.scan I imported OrderedDict from collection to use for the shared variables. Using a dict will result in the following error message:

Expected OrderedDict or OrderedUpdates, got <class 'dict'>. This can make your script non-deterministic.

Secondly, I defined a function where the loss and gradient are to be computed. The function returns the loss and an OrderedDict(). The functions

def cost(inputs, outputs, learn_rate, theta, bias):
    hyp = T.dot(theta, inputs) + bias
    loss = T.mean((hyp - outputs)**2)/2

    grad_t, grad_b = T.grad(loss, [theta, bias])

    return loss, OrderedDict([(theta, theta-learn_rate*grad_t),
                              (bias, bias-learn_rate*grad_b)])

This was followed by defining theano.scan() as such:

results, updates = theano.scan(fn=cost,
                               non_sequences=[x, y, learn_rate, theta, bias],
                               n_steps=epochs)

I chose to includex and y as non_sequences due to the relative small size of this toy example and since it is about twice as fast compared to passing them as sequences.

Lastly, theano.function() was defined using results, updates from theano.scan()

train = theano.function(inputs=[x, y, learn_rate, epochs], outputs=results,
                        updates=updates)

Putting it all toghether we have:

#### Libraries
# Standard Libraries
from collections import OrderedDict

# Third Party Libraries
# import matplotlib.pyplot as plt
import numpy as np
# from sklearn import linear_model
import theano
import theano.tensor as T

# def gen_data(num_points=50, slope=1, bias=10, x_max=50):
#     pass # Use the code in the above post to generate sample points

########################################################################
# Generate Data
train_x, train_y = gen_data(num_points=50, slope=2)

# Declaring variable
x = T.vector(name='x', dtype=theano.config.floatX)
y = T.vector(name='y', dtype=theano.config.floatX)

learn_rate = T.scalar(name='learn_rate', dtype=theano.config.floatX)
epochs = T.iscalar(name='epochs')

# Variables that will be updated, hence are declared as `theano.share`
theta = theano.shared(np.random.rand(), name='theta')
bias = theano.shared(np.random.rand(), name='bias')

def cost(inputs, outputs, learn_rate, theta, bias):
    hyp = T.dot(theta, inputs) + bias
    loss = T.mean((hyp - outputs)**2)/2

    grad_t, grad_b = T.grad(loss, [theta, bias])

    return loss, OrderedDict([(theta, theta-learn_rate*grad_t),
                              (bias, bias-learn_rate*grad_b)])

results, updates = theano.scan(fn=cost,
                               non_sequences=[x, y, learn_rate, theta, bias],
                               n_steps=epochs)

# results, updates = theano.scan(fn=cost,
#                              sequences=[x, y],
#                              non_sequences = [learn_rate, theta, bias],
#                              n_steps=epochs)

train = theano.function(inputs=[x, y, learn_rate, epochs], outputs=results,
                        updates=updates)

print('weight: {}, bias: {}'.format(theta.get_value(), bias.get_value()))
train(train_x, train_y, 0.001, 30)
print('------------------------------')
print('weight: {}, bias: {}'.format(theta.get_value(), bias.get_value()))

I've included the code to pass x and y as sequences for completeness. Simply uncomment out that part of the code and AND comment out the other instance of theano.scan().

Lukasz
  • 2,476
  • 10
  • 41
  • 51