1

For some custom code, I need to run a for-loop to dynamically create a variable in Tensorflow 2 (with eager execution mode enabled). (In my custom code, the values I write to the variable will require gradients, so I want to track the computations in the for loop, so I can get gradients from autodiff). My code works, but it is incredibly slow. In fact, it is several orders of magnitude slower than performing the same operation in numpy.

I have isolated the problem and I am providing a toy code snippet which highlights the problem. Fixing it will allow me to fix my custom code.

import numpy as np
import tensorflow as tf
import timeit

N = int(1e5)
data = np.random.randn(N)
def numpy_func(data):
    new_data = np.zeros_like(data)
    for i in range(len(data)):
        new_data[i] = data[i]
    return new_data

def tf_func(data):
    new_data = tf.Variable(tf.zeros_like(data))
    for i in range(len(data)):
        new_data[i].assign(data[i])
    return new_data    

%timeit numpy_func(data)
%timeit tf_func(data)

The key takeaways of this code snippet are that in the for-loop I need to only update a slice of the variable. The slice to be updated is different at each iteration. The data used for the update is different at each iteration (In my custom code it is the result of a few simple computations that depend on slices of the variable, here I am just using the fixed array for problem isolation.)

I am using Tensorflow 2 and the tensorflow code ideally needs to run with eager execution enabled, since parts of the custom operations depend on eager execution.

I am new to Tensorflow and I would really appreciate any help with fixing this problem.

Many thanks, Max

mbpaulus
  • 7,301
  • 3
  • 29
  • 40

1 Answers1

0

TensorFlow will never be very fast when used like that. The ideal solution would be to vectorize your computation so it does not require you to loop explicitly, but that depends on the exact thing that you are computing (you could post another question about that if you want). However, you can get a bit better performance using tf.function. I changed a bit your function to have new_data as an output parameter, since tf.function does not allow you to create variables after the first call (but actually if you remove the new_data parameter it will also work, as tf.function will find the variable in the global scope).

import numpy as np
import tensorflow as tf
import timeit

# Input data
N = int(1e3)
data = np.random.randn(N)

# NumPy
def numpy_func(data, new_data):
    new_data[:] = 0
    for i in range(len(data)):
        new_data[i] = data[i]

new_data = np.zeros_like(data)
%timeit numpy_func(data, new_data)
# 143 µs ± 4.41 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

# TensorFlow
def tf_func(data, new_data):
    new_data.assign(tf.zeros_like(data))
    for i in range(len(data)):
        new_data[i].assign(data[i])
new_data = tf.Variable(tf.zeros_like(data))
%timeit tf_func(data, new_data)
# 119 ms ± 3.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

# tf.function
# This is equivalent to using it as a decorator
tf_func2 = tf.function(tf_func)
new_data = tf.Variable(tf.zeros_like(data))
tf_func2(data, new_data)  # First call is slower
%timeit tf_func2(data, new_data)
# 3.55 ms ± 40.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

This was ran on CPU, results may vary significantly on GPU. In any case, as you can see, with tf.function it is still more than 20x slower than NumPy, but also more than 30x faster than the Python function.

jdehesa
  • 58,456
  • 7
  • 77
  • 121