In the context of some neural network research I'm evaluating several approaches on how to implement these or what library to use. Currently I'm comparing Tensorflow and Theano and I'm struggling with getting TenorFlow to perform well. Here is my simple Hello-Gradient-Benchmark, it just optimizes a scalar multiplication with one coefficient.
import time
class Timer:
def __init__(self, what):
self.what = what
def __enter__(self):
self.t1 = time.time()
return self
def __exit__(self,t,v,tb):
t2 = time.time()
print("{0} runs {1:.4f} seconds".format(self.what, t2-self.t1))
def run_tensorflow():
import tensorflow as tf
x = tf.placeholder(tf.float32)
y = tf.placeholder(tf.float32)
a = tf.Variable([1.], tf.float32)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
loss = (y-a*x)**2
step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
def one_step():
sess.run(step, {x:1.,y:0.})
with Timer('tensorflow') as t:
result = [ one_step() for n in range(1000) ]
def run_theano():
import theano as th
x = th.tensor.dscalar()
y = th.tensor.dscalar()
a = th.tensor.dscalar()
l = a*x
loss = (y-l)**2
dloss = th.tensor.grad(loss, a)
dloss_f = th.function([x,y,a], dloss)
a = [1.]
def one_step():
a[0] -= 0.01 * dloss_f(1.,0.,a[0])
with Timer('theano') as t:
result = [ one_step() for n in range(1000) ]
run_tensorflow()
run_theano()
I'm running this program on the CPU with the packages installed via pip
. Running times are 0.36 and 0.043 seconds for TensorFlow and Theano, respectively. I see similar performance differences for real networks where the matrix-multiplication overhead should dominate, still TensorFlow is significantly slower.
I want to know if I'm using Tensorflow wrongly for what I'm trying to do. Should I not call the run()
method within a loop?