EDIT: Solved -- it was the stupidity of using different training examples for the gradients vs the optimizer update.
OK this has me totally stumped.
I have a parameter vector, let's call it w.
w = [-1.34554319, 0.86998659, 0.52366061, 2.6723526 , 0.18756115, 0.16547382]
I use compute_gradients
to figure out the gradients to w, it tells me the gradient is:
dw = [-0.0251517 , 0.88050844, 0.80362262, 0.14870925, 0.10019595, 1.33597524]
My learning rate is 0.1
. Ergo:
w_new = w - 0.1 * dw
w_new = [-1.34302802, 0.78193575, 0.44329835, 2.65748168, 0.17754156, 0.0318763 ]
You can check the math yourself but it should check out. However, if I run the tensorflow code and evaluate the value of w_new
, I get:
w_new_tf = [-1.27643258, 0.9212401 , 0.09922112, 2.55617223, 0.38039282, 0.15450044]
I honestly have no idea why it's doing this.
Edit: Let me provide you the exact code to show you why it doesn't work. It might be due to indexing, as you will see.
Here is the boilerplate starter code.
import numpy as np
import tensorflow as tf
max_item = 331922
max_user = 1581603
k = 6
np.random.seed(0)
_item_biases = np.random.normal(size=max_item)
np.random.seed(0)
_latent_items = np.random.normal(size=(max_item, k))
np.random.seed(0)
_latent_users = np.random.normal(size=(max_user, k))
item_biases = tf.Variable(_item_biases, name='item_biases')
latent_items = tf.Variable(_latent_items, name='latent_items')
latent_users = tf.Variable(_latent_users, name='latent_users')
input_data = tf.placeholder(tf.int64, shape=[3], name='input_data')
Here is the custom objective function.
def objective(data, lam, item_biases, latent_items, latent_users):
with tf.name_scope('indices'):
user = data[0]
rated_item = data[1]
unrated_item = data[2]
with tf.name_scope('input_slices'):
rated_item_bias = tf.gather(item_biases, rated_item, name='rated_item_bias')
unrated_item_bias = tf.gather(item_biases, unrated_item, name='unrated_item_bias')
rated_latent_item = tf.gather(latent_items, rated_item, name='rated_latent_item')
unrated_latent_item = tf.gather(latent_items, unrated_item, name='unrated_latent_item')
latent_user = tf.gather(latent_users, user, name='latent_user')
with tf.name_scope('bpr_opt'):
difference = tf.subtract(rated_item_bias, unrated_item_bias, 'bias_difference')
ld = tf.subtract(rated_latent_item, unrated_latent_item, 'latent_item_difference')
latent_difference = tf.reduce_sum(tf.multiply(ld, latent_user), name='latent_difference')
total_difference = tf.add(difference, latent_difference, name='total_difference')
with tf.name_scope('obj'):
obj = tf.sigmoid(total_difference, name='activation')
with tf.name_scope('regularization'):
reg = lam * tf.reduce_sum(rated_item_bias**2)
reg += lam * tf.reduce_sum(unrated_item_bias**2)
reg += lam * tf.reduce_sum(rated_latent_item**2)
reg += lam * tf.reduce_sum(unrated_latent_item**2)
reg += lam * tf.reduce_sum(latent_user**2)
with tf.name_scope('final'):
final_obj = -tf.log(obj) + reg
return final_obj
Here is some boilerplate code to actually minimize the function. At two points I do a sess.run
call on the tf.Variable
s to see how the values have changed.
obj = objective(input_data, 0.05, item_biases, latent_items, latent_users)
optimizer = tf.train.GradientDescentOptimizer(0.1)
trainer = optimizer.minimize(obj)
sess = tf.Session()
sess.run(tf.global_variables_initializer())
citem_biases, clatent_items, clatent_users = \
sess.run([item_biases, latent_items, latent_users])
print (clatent_users[1490103]) # [-1.34554319, 0.86998659, 0.52366061, 2.6723526 , 0.18756115, 0.16547382]
cvalues = sess.run([trainer, obj], feed_dict={input_data:[1490103, 278755, 25729]})
citem_biases, clatent_items, clatent_users = \
sess.run([item_biases, latent_items, latent_users])
print (clatent_users[1490103]) #[-1.27643258, 0.9212401 , 0.09922112, 2.55617223, 0.38039282, 0.15450044]
Finally, here is some code to actually get the gradients. These gradients are double checked against a hand-derived gradients so they are correct. Sorry for the ugliness of the code, it's a blatant copy & paste of another SO answer:
grads_and_vars = optimizer.compute_gradients(obj, tf.trainable_variables())
sess = tf.Session()
sess.run(tf.global_variables_initializer())
gradients_and_vars = sess.run(grads_and_vars, feed_dict={input_data:[1490103, 278830, 140306]})
print (gradients_and_vars[2][0]) #[-0.0251517 , 0.88050844, 0.80362262, 0.14870925, 0.10019595, 1.33597524]