tensorflow: change in parameter weights is different than it should be based on gradients

Question

EDIT: Solved -- it was the stupidity of using different training examples for the gradients vs the optimizer update.

OK this has me totally stumped.

I have a parameter vector, let's call it w.

w = [-1.34554319, 0.86998659, 0.52366061, 2.6723526 , 0.18756115, 0.16547382]

I use compute_gradients to figure out the gradients to w, it tells me the gradient is:

dw = [-0.0251517 , 0.88050844, 0.80362262, 0.14870925, 0.10019595, 1.33597524]

My learning rate is 0.1. Ergo:

w_new = w - 0.1 * dw

w_new = [-1.34302802, 0.78193575, 0.44329835, 2.65748168, 0.17754156, 0.0318763 ]

You can check the math yourself but it should check out. However, if I run the tensorflow code and evaluate the value of w_new, I get:

w_new_tf = [-1.27643258, 0.9212401 , 0.09922112, 2.55617223, 0.38039282, 0.15450044]

I honestly have no idea why it's doing this.

Edit: Let me provide you the exact code to show you why it doesn't work. It might be due to indexing, as you will see.

Here is the boilerplate starter code.

import numpy as np
import tensorflow as tf

max_item = 331922
max_user = 1581603
k = 6
np.random.seed(0)
_item_biases = np.random.normal(size=max_item)
np.random.seed(0)
_latent_items = np.random.normal(size=(max_item, k))
np.random.seed(0)
_latent_users = np.random.normal(size=(max_user, k))

item_biases = tf.Variable(_item_biases, name='item_biases')
latent_items = tf.Variable(_latent_items, name='latent_items')
latent_users = tf.Variable(_latent_users, name='latent_users')

input_data = tf.placeholder(tf.int64, shape=[3], name='input_data')

Here is the custom objective function.

def objective(data, lam, item_biases, latent_items, latent_users):  
    with tf.name_scope('indices'):
        user = data[0]
        rated_item = data[1]
        unrated_item = data[2]

    with tf.name_scope('input_slices'):
        rated_item_bias = tf.gather(item_biases, rated_item, name='rated_item_bias')
        unrated_item_bias = tf.gather(item_biases, unrated_item, name='unrated_item_bias')

        rated_latent_item = tf.gather(latent_items, rated_item, name='rated_latent_item')
        unrated_latent_item = tf.gather(latent_items, unrated_item, name='unrated_latent_item')

        latent_user = tf.gather(latent_users, user, name='latent_user')

    with tf.name_scope('bpr_opt'):
        difference = tf.subtract(rated_item_bias, unrated_item_bias, 'bias_difference')
        ld = tf.subtract(rated_latent_item, unrated_latent_item, 'latent_item_difference')
        latent_difference = tf.reduce_sum(tf.multiply(ld, latent_user), name='latent_difference')
        total_difference = tf.add(difference, latent_difference, name='total_difference')

    with tf.name_scope('obj'):        
        obj = tf.sigmoid(total_difference, name='activation')
    with tf.name_scope('regularization'):
        reg = lam * tf.reduce_sum(rated_item_bias**2)
        reg += lam * tf.reduce_sum(unrated_item_bias**2) 
        reg += lam * tf.reduce_sum(rated_latent_item**2) 
        reg += lam * tf.reduce_sum(unrated_latent_item**2)
        reg += lam * tf.reduce_sum(latent_user**2)

    with tf.name_scope('final'):
        final_obj = -tf.log(obj) + reg


    return final_obj

Here is some boilerplate code to actually minimize the function. At two points I do a sess.run call on the tf.Variables to see how the values have changed.

obj = objective(input_data, 0.05, item_biases, latent_items, latent_users)

optimizer = tf.train.GradientDescentOptimizer(0.1)
trainer = optimizer.minimize(obj)
sess = tf.Session()
sess.run(tf.global_variables_initializer())


citem_biases, clatent_items, clatent_users = \
    sess.run([item_biases, latent_items, latent_users])

print (clatent_users[1490103]) # [-1.34554319, 0.86998659, 0.52366061, 2.6723526 , 0.18756115, 0.16547382]

cvalues = sess.run([trainer, obj], feed_dict={input_data:[1490103, 278755, 25729]})
citem_biases, clatent_items, clatent_users = \
    sess.run([item_biases, latent_items, latent_users]) 
print (clatent_users[1490103]) #[-1.27643258,  0.9212401 ,  0.09922112,  2.55617223,  0.38039282, 0.15450044]

Finally, here is some code to actually get the gradients. These gradients are double checked against a hand-derived gradients so they are correct. Sorry for the ugliness of the code, it's a blatant copy & paste of another SO answer:

grads_and_vars = optimizer.compute_gradients(obj, tf.trainable_variables())
sess = tf.Session()
sess.run(tf.global_variables_initializer())
gradients_and_vars = sess.run(grads_and_vars, feed_dict={input_data:[1490103, 278830, 140306]})
print (gradients_and_vars[2][0]) #[-0.0251517 ,  0.88050844,  0.80362262,  0.14870925,  0.10019595, 1.33597524]

Alright, I added my entire code. It's not pretty, but there could be a million things in my code that cause this issue (bugs or tensorflow issues) so I've gone ahead and posted the entire thing. — anon, Mar 21 '18 at 09:44

score 0 · Answer 1 · answered Mar 21 '18 at 08:57

You did not provide complete code, but I ran a similar example and it did work for me as it should. Here is my code:

with tf.Graph().as_default():
  ph = tf.constant([1., 2., 3.])
  v = tf.get_variable('v', (3,))
  loss = tf.square(ph-v)
  optimizer = tf.train.GradientDescentOptimizer(0.1)
  trainer = optimizer.minimize(loss)
  gradients = optimizer.compute_gradients(loss)[0][0]

  with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    grad_mat = sess.run(gradients)
    v_0 = sess.run(v)
    sess.run(trainer)
    v_1 = sess.run(v)

    print(grad_mat)
    print(v_0)
    print(v_0 - 0.1*grad_mat)
    print(v_1)

Here is the output (obviously it will be a bit different each time, due to the random initialization of get_variable):

[-2.01746035 -5.61006117 -6.7561307 ]
[-0.00873017 -0.80503058 -0.37806535]
[ 0.19301586 -0.24402446  0.29754776]
[ 0.19301586 -0.24402446  0.29754776]

The last two lines are identical, as you would expect.

Sure thing. I added the full extent of my code. I suspect it's not a simple gradient descent problem (one would certainly hope not!) but perhaps an issue when you deal with sparse slicing and such. Try running the code I have at the top and let me know if you run into the same issues. — anon, Mar 21 '18 at 09:43

score 0 · Answer 2 · answered Mar 21 '18 at 19:32

0

Problem: I was feeding different inputs for gradients vs. the trainer. Solution: Feed the same input.

answered Mar 21 '18 at 19:32

anon

407
2
12

tensorflow: change in parameter weights is different than it should be based on gradients

2 Answers2