-1

In the context of some neural network research I'm evaluating several approaches on how to implement these or what library to use. Currently I'm comparing Tensorflow and Theano and I'm struggling with getting TenorFlow to perform well. Here is my simple Hello-Gradient-Benchmark, it just optimizes a scalar multiplication with one coefficient.

import time

class Timer:

   def __init__(self, what):
      self.what = what

   def __enter__(self):
      self.t1 = time.time()
      return self

   def __exit__(self,t,v,tb):
      t2 = time.time()
      print("{0} runs {1:.4f} seconds".format(self.what, t2-self.t1))


def run_tensorflow():

   import tensorflow as tf

   x = tf.placeholder(tf.float32)
   y = tf.placeholder(tf.float32)
   a = tf.Variable([1.], tf.float32)

   sess = tf.Session()
   sess.run(tf.global_variables_initializer())

   loss = (y-a*x)**2
   step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

   def one_step():
      sess.run(step, {x:1.,y:0.})

   with Timer('tensorflow') as t:
      result = [ one_step() for n in range(1000) ]


def run_theano():

   import theano as th

   x = th.tensor.dscalar()
   y = th.tensor.dscalar()
   a = th.tensor.dscalar()
   l = a*x

   loss = (y-l)**2
   dloss = th.tensor.grad(loss, a)
   dloss_f = th.function([x,y,a], dloss)

   a = [1.]

   def one_step():
      a[0] -= 0.01 * dloss_f(1.,0.,a[0])

   with Timer('theano') as t:
      result = [ one_step() for n in range(1000) ]


run_tensorflow()
run_theano()

I'm running this program on the CPU with the packages installed via pip. Running times are 0.36 and 0.043 seconds for TensorFlow and Theano, respectively. I see similar performance differences for real networks where the matrix-multiplication overhead should dominate, still TensorFlow is significantly slower.

I want to know if I'm using Tensorflow wrongly for what I'm trying to do. Should I not call the run() method within a loop?

André Bergner
  • 1,429
  • 10
  • 10
  • BTW, matrix-matrix performance in TF should be close to theoretical maximums (11 T/sec on Titan X, 1 T/sec on Xeon V3), here's benchmark you can use -- https://stackoverflow.com/questions/41804380/testing-gpu-with-tensorflow-matrix-multiplication – Yaroslav Bulatov Jul 28 '17 at 16:14

1 Answers1

4
  1. TF and Theano are designed for handling large objects, on the order of 1M elements. Benchmarking their handling of scalars is not particularly relevant.

  2. This is an apples-to-oranges comparison: With TF, you are timing both the compilation and the run time, while in Theano, you are only timing the run time! This is because when you call theano.function, it does all the compilation then. OTOH in TF, much of this work is shifted to when you first call sess.run.

That said, there are also realistic scenarios when TF is slower than Theano.

MWB
  • 11,740
  • 6
  • 46
  • 91
  • 1
    Before downtoving, get a sense of whether I know what I'm talking about: https://stackoverflow.com/users/1937197/maxb?tab=bounties&sort=earned – MWB Jul 27 '17 at 23:17
  • Thanks for some constructive answer. I also (as mentioned in my question) did some test with big matrices but see similar results. I now moved to `keras`. Two interesting observations: 1) the same example (simple dense nn) is even much faster then my handwritten theano and tensorflow code. 2) If I compare theano vs tensorflow backends, theano wins, in some cases dramatically. I cannot find any example where tensorflow is on par or even faster. So I assume it's just slower at least on CPU. – André Bergner Jul 29 '17 at 19:28