I have been testing my WGAN-GP algorithm on TensorFlow using the traditional graph implementation. Recently, I happened to get to know TensorFlow Eager Execution and tried to convert my code to run on the eager execution mode.
Let me first show you the previous code:
self.x_ = self.g_net(self.z)
self.d = self.d_net(self.x, reuse=False)
self.d_ = self.d_net(self.x_)
self.d_loss = tf.reduce_mean(self.d) - tf.reduce_mean(self.d_)
epsilon = tf.random_uniform([], 0.0, 1.0)
x_hat = epsilon * self.x + (1 - epsilon) * self.x_
d_hat = self.d_net(x_hat)
ddx = tf.gradients(d_hat, x_hat)[0]
ddx = tf.sqrt(tf.reduce_sum(tf.square(ddx), axis=1))
ddx = tf.reduce_mean(tf.square(ddx - 1.0) * scale)
self.d_loss = self.d_loss + ddx
self.d_adam = tf.train.AdamOptimizer().minimize(self.d_loss, var_list=self.d_net.vars)
Then it is converted to:
self.x_ = self.g_net(self.z)
epsilon = tf.random_uniform([], 0.0, 1.0)
x_hat = epsilon * self.x + (1 - epsilon) * self.x_
with tf.GradientTape(persistent=True) as temp_tape:
temp_tape.watch(x_hat)
d_hat = self.d_net(x_hat)
ddx = temp_tape.gradient(d_hat, x_hat)[0]
ddx = tf.sqrt(tf.reduce_sum(tf.square(ddx), axis=1))
ddx = tf.reduce_mean(tf.square(ddx - 1.0) * 10)
with tf.GradientTape() as d_tape:
d = self.d_net(x)
d_ = self.d_net(x_)
loss_d = tf.reduce_mean(d) - tf.reduce_mean(d_) + ddx
grad_d = d_tape.gradient(loss_d, self.d_net.variables)
self.d_adam.apply_gradients(zip(grad_d, self.d_net.variables))
I tried several alternative ways to implement WGAN-GP loss, but in any way the d_loss is diverging! I hope there is someone who could enlighten me by pointing out my mistake(s).
Furthermore, I wonder whether I could use Keras layers with my previous loss and optimizer implementation. Thank you in advance!