0

I expected the gradient for tf.sign() in TensorFlow to be equal to 0 or None. However, when I examined the gradients, I found that they were equal to very small numbers (e.g. 1.86264515e-09). Why is that?

(If you are curious as to why I even want to know this, it is because I want to implement the "straight-through estimator" described here, and before overriding the gradient for tf.sign(), I wanted to check that the default behavior was in fact what I was expecting.)

EDIT: Here is some code which reproduces the error. The model is just the linear regression model from the introduction to TensorFlow, except that I use y=sign(W)x + b instead of y=Wx + b.

import tensorflow as tf
import numpy as np

def gvdebug(g, v):
    g2 = tf.zeros_like(g, dtype=tf.float32)
    v2 = tf.zeros_like(v, dtype=tf.float32)
    g2 = g
    v2 = v
    return g2,v2


# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = tf.sign(W) * x_data + b

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
grads_and_vars = optimizer.compute_gradients(loss)
gv2 = [gvdebug(gv[0], gv[1]) for gv in grads_and_vars]
apply_grads = optimizer.apply_gradients(gv2)

# Before starting, initialize the variables.  We will 'run' this first.
init = tf.initialize_all_variables()

# Launch the graph.
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.01)
sess = tf.Session(config=tf.ConfigProto(gpu_options=gpu_options))
sess.run(init)

# Fit the line.
for step in range(201):
    sess.run(apply_grads)
    if (step % 20 == 0) or ((step-1) % 20 == 0):
        print("")
        print(sess.run(gv2[0][1])) #the variable
        print(sess.run(gv2[0][0])) #the gradient
        print("")
        print(step, sess.run(W), sess.run(b))

# Learns best fit is W: [0.1], b: [0.3]
random_stuff
  • 197
  • 2
  • 12

0 Answers0