I'm trying to train an extremely simple neural network with Lasagne: one dense layer with one output, without nonlinearity (so it's simply a linear regression). Here's my code:
#!/usr/bin/env python
import numpy as np
import theano
import theano.tensor as T
import lasagne
import time
def build_mlp(input_var=None):
l_in = lasagne.layers.InputLayer(shape=(None, 36), input_var=input_var)
l_out = lasagne.layers.DenseLayer(
l_in,
num_units=1)
return l_out
if __name__ == '__main__':
start_time = time.time()
input_var = T.matrix('inputs')
target_var = T.fvector('targets')
network = build_mlp(input_var)
prediction = lasagne.layers.get_output(network)[:, 0]
loss = lasagne.objectives.aggregate(lasagne.objectives.squared_error(prediction, target_var), mode="sum")
params = lasagne.layers.get_all_params(network, trainable=True)
updates = lasagne.updates.nesterov_momentum(loss, params, learning_rate=0.01, momentum=0.01)
train_fn = theano.function([input_var, target_var], loss, updates=updates, allow_input_downcast=True)
features = [-0.7275278, -1.2492378, -1.1284761, -1.5771232, -1.6482532, 0.57888401,\
-0.66000223, 0.89886779, -0.61547941, 1.2937579, -0.74761862, -1.4564357, 1.4365945,\
-3.2745962, 1.3266684, -3.6136472, 1.5396905, -0.60452163, 1.1510054, -1.0534937,\
1.0851847, -0.096269868, 0.15175876, -2.0422907, 1.6125549, -1.0562884, 2.9321988,\
-1.3044566, 2.5821636, -1.2787727, 2.0813208, -0.87762129, 1.493879, -0.60782474, 0.77946049, 0.0]
print("Network built in " + str(time.time() - start_time) + " sec")
it_number = 1000
start_time = time.time()
for i in xrange(it_number):
val = lasagne.layers.get_output(network, features).eval()[0][0]
print("1K outputs: " + str(time.time() - start_time) + " sec")
p = params[0].eval()
start_time = time.time()
for i in xrange(it_number):
n = np.dot(features, p)
print("1K dot products: " + str(time.time() - start_time) + " sec")
print(val)
print(n)
I'm not training a network here yet, just doing 1K evals (with initial random weights) to see how much time it will take to get 1K actual predictions of my network. Comparing to 1K dot products it's a terrible slowdown!
Network built in 8.86999106407 sec
1K outputs: 53.0574831963 sec
1K dot products: 0.00349998474121 sec
0.0
[-3.37383742]
So my question is: why it takes so much time to evaluate such simple network?
Also, I'm confused about the predicted value. If the dot product is less than zero, the network outputs 0, otherwise these two values are the same:
Network built in 8.96299982071 sec
1K outputs: 54.2732210159 sec
1K dot products: 0.00287079811096 sec
1.10120121082
[ 1.10120121]
Am I missing something about how DenseLayer works?