AdamOptimizer and GradientDescentOptimizer from tensorflow not able to fit simple data

Question

Similar question: Here

I am trying out TensorFlow. I generated simple data which is linearly separable and tried to fit a linear equation to it. Here is the code.

np.random.seed(2010)
n = 300
x_data = np.random.random([n, 2]).tolist()
y_data = [[1., 0.] if v[0]> 0.5 else [0., 1.] for v in x_data]

x = tf.placeholder(tf.float32, [None, 2]) 
W = tf.Variable(tf.zeros([2, 2]))
b = tf.Variable(tf.zeros([2]))
y = tf.sigmoid(tf.matmul(x , W) + b)

y_ = tf.placeholder(tf.float32, [None, 2]) 
cross_entropy = -tf.reduce_sum(y_ * tf.log(tf.clip_by_value(y, 1e-9, 1)))
train_step = tf.train.AdamOptimizer(0.01).minimize(cross_entropy)

correct_predict = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 
accuracy = tf.reduce_mean(tf.cast(correct_predict, tf.float32))

s = tf.Session()
s.run(tf.initialize_all_variables())

for i in range(10):
        s.run(train_step, feed_dict = {x: x_data, y_: y_data})
        print(s.run(accuracy, feed_dict = {x: x_data, y_: y_data}))

print(s.run(accuracy, feed_dict = {x: x_data, y_: y_data}), end=",")

I get the following output:

0.536667, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46, 0.46

Right after the first iteration it gets struck at 0.46.

And following is the plot:

Then I changed the code to use gradient descent:

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

Now i got the following: 0.54, 0.54, 0.63, 0.70, 0.75, 0.8, 0.84, 0.89, 0.92, 0.94, 0.94

And following is the plot:

My questions:

1) Why is the AdamOptimizer failing?

2) If the issue is with learning rate, or other parameters which I need to tune, how do I generally debug them?

3) I ran gradient descent for 50 iterations (I ran for 10 above) and printed the accuracy every 5 iterations and this is the output:

0.54, 0.8, 0.95, 0.96, 0.92, 0.89, 0.87, 0.84, 0.81, 0.79, 0.77.

Clearly it started to diverge, looks like the issue is with fixed learning rate (it is overshooting after a point). Am I right?

4) In this toy example what can be done to get a better fit. Ideally it should have 1.0 accuracy as the data is linearly separable.

[EDIT]

As requested by @Yaroslav, here is the code used for plots

xx = [v[0] for v in x_data]
yy = [v[1] for v in x_data]
x_min, x_max = min(xx) - 0.5, max(xx) + 0.5 
y_min, y_max = min(yy) - 0.5, max(yy) + 0.5 
xxx, yyy = np.meshgrid(np.arange(x_min, x_max, 0.02), np.arange(y_min, y_max, 0.02))
pts = np.c_[xxx.ravel(), yyy.ravel()].tolist()
# ---> Important
z = s.run(tf.argmax(y, 1), feed_dict = {x: pts})
z = np.array(z).reshape(xxx.shape)
plt.pcolormesh(xxx, yyy, z)
plt.scatter(xx, yy, c=['r' if v[0] == 1 else 'b' for v in y_data], edgecolor='k', s=50)
plt.show()

Yaroslav Bulatov · Accepted Answer · 2016-04-07T18:12:24.717

TLDR; your loss is wrong. Loss goes to zero without decreasing accuracy.

The problem is that your probabilities are not normalized. If you look at your loss, it's going down, but probabilities for both y[:0] and y[:1] are going to 1, so argmax is meaningless.

Traditional solution is to use only 1 degree of freedom instead of 2, so your probability for first class is sigmoid(y), and for second class it is 1-sigmoid(y) so cross entropy is something like -y[0]log(sigmoid(y0)) - y[1]log(1-sigmoid(y0))

Alternatively you could change your code is to use tf.nn.softmax instead of tf.sigmoid. This divides by the sum of the probabilities so the optimizer can't decrease loss by driving both probabilities to 1 simultaneously.

The following gets to 0.99666673 accuracy.

tf.reset_default_graph()
np.random.seed(2010)
n = 300
x_data = np.random.random([n, 2]).tolist()
y_data = [[1., 0.] if v[0]> 0.5 else [0., 1.] for v in x_data]

x = tf.placeholder(tf.float32, [None, 2]) 
W = tf.Variable(tf.zeros([2, 2]))
b = tf.Variable(tf.zeros([2]))
y = tf.nn.softmax(tf.matmul(x , W) + b)

y_ = tf.placeholder(tf.float32, [None, 2]) 
cross_entropy = -tf.reduce_sum(y_ * tf.log(y))
regularizer = tf.reduce_sum(tf.square(y))
train_step = tf.train.AdamOptimizer(1.0).minimize(cross_entropy+regularizer)

correct_predict = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 
accuracy = tf.reduce_mean(tf.cast(correct_predict, tf.float32))

s = tf.Session()
s.run(tf.initialize_all_variables())

for i in range(30):
        s.run(train_step, feed_dict = {x: x_data, y_: y_data})
        cost1,cost2=s.run([cross_entropy,accuracy], feed_dict = {x: x_data, y_: y_data})
        print(cost1, cost2)

PS: can you share the code you used for making the plots above?

Thanks for detailed explanation, that solved the problem. Code used for plots: [ edit due to formatting issues, i am adding the code to question — anuml, Apr 07 '16 at 19:46
also may I ask how you debugged my code? Looking at your modified code, looks like u figured it out by printing both cross_entropy and accuracy, right? — anuml, Apr 07 '16 at 19:53
Yes. First I checked that minimization is working, and then tried to figure out why there's mismatch between loss and accuracy — Yaroslav Bulatov, Apr 07 '16 at 20:21

AdamOptimizer and GradientDescentOptimizer from tensorflow not able to fit simple data

1 Answers1

Linked