4

I have been testing my WGAN-GP algorithm on TensorFlow using the traditional graph implementation. Recently, I happened to get to know TensorFlow Eager Execution and tried to convert my code to run on the eager execution mode.

Let me first show you the previous code:

self.x_ = self.g_net(self.z)
self.d  = self.d_net(self.x, reuse=False)
self.d_ = self.d_net(self.x_)

self.d_loss = tf.reduce_mean(self.d) - tf.reduce_mean(self.d_)

epsilon = tf.random_uniform([], 0.0, 1.0)

x_hat = epsilon * self.x + (1 - epsilon) * self.x_
d_hat = self.d_net(x_hat)

ddx = tf.gradients(d_hat, x_hat)[0]
ddx = tf.sqrt(tf.reduce_sum(tf.square(ddx), axis=1))
ddx = tf.reduce_mean(tf.square(ddx - 1.0) * scale)

self.d_loss = self.d_loss + ddx

self.d_adam = tf.train.AdamOptimizer().minimize(self.d_loss, var_list=self.d_net.vars)

Then it is converted to:

self.x_ = self.g_net(self.z)

epsilon = tf.random_uniform([], 0.0, 1.0)

x_hat = epsilon * self.x + (1 - epsilon) * self.x_

with tf.GradientTape(persistent=True) as temp_tape:
    temp_tape.watch(x_hat)
    d_hat = self.d_net(x_hat)

ddx = temp_tape.gradient(d_hat, x_hat)[0]
ddx = tf.sqrt(tf.reduce_sum(tf.square(ddx), axis=1))
ddx = tf.reduce_mean(tf.square(ddx - 1.0) * 10)

with tf.GradientTape() as d_tape:
    d  = self.d_net(x)
    d_ = self.d_net(x_)

    loss_d = tf.reduce_mean(d) - tf.reduce_mean(d_) + ddx

grad_d = d_tape.gradient(loss_d, self.d_net.variables)

self.d_adam.apply_gradients(zip(grad_d, self.d_net.variables))

I tried several alternative ways to implement WGAN-GP loss, but in any way the d_loss is diverging! I hope there is someone who could enlighten me by pointing out my mistake(s).

Furthermore, I wonder whether I could use Keras layers with my previous loss and optimizer implementation. Thank you in advance!

dalbom
  • 65
  • 5
  • Aargh... Sorry. The title 'Nested Gradient Tape' was for my initial implementation. I was trying to insert 'temp_tape' code block inside the 'd_tape' code block. But it didn't work. :( The code above was the only way to run without errors but it is giving me wrong values. – dalbom Oct 29 '18 at 14:14
  • There is at least one major difference between the two code snippets. In the eager version, `ddx` enters the second tape as a constant. Its value seem to depend on the variables. If you want that computation to be part of your second gradient, you need that computation to be part of the tape. It looks like you would need nested tapes. There are some tests for nested tapes here: https://github.com/tensorflow/tensorflow/blob/3c2dabf53dd085c21e38a28b467e52c566c0dfaf/tensorflow/python/eager/backprop_test.py#L781. If you get some error, post it. – iga Feb 02 '19 at 04:42
  • I got the same problem. I tried putting everything in a single gradient and calculating the gradient of d_hat directly inside the main tape. It just gives a warning about performances but then the loss keeps diverging. After experimenting a bit, now I got only None gradients. Did you manage to find an elegant way? – EdoG Jun 05 '19 at 13:43
  • @EdoardoG Unfortunately, no. I just returned to the previous implementation. – dalbom Sep 19 '19 at 08:25
  • 1
    In case this is still causing problems, there's a working example of a WGAN with Gradient Penalties using TensorFlow 2.0 in eager mode, here: https://github.com/LynnHo/DCGAN-LSGAN-WGAN-GP-DRAGAN-Tensorflow-2 – Chris Dec 09 '19 at 21:23

0 Answers0