I am trying to implement experience replay in Tensorflow. The problem I am having is in storing outputs for the models trial and then updating the gradient simultaneously. A couple approaches I have tried are to store the resulting values from sess.run(model), however, these are not tensors and cannot be used for gradient descent as far as tensorflow is concerned. I am currently trying to use tf.assign(), however, The difficulty I am having is best shown through this example.
import tensorflow as tf
import numpy as np
def get_model(input):
return input
a = tf.Variable(0)
b = get_model(a)
d = tf.Variable(0)
for i in range(10):
assign = tf.assign(a, tf.Variable(i))
b = tf.Print(b, [assign], "print b: ")
c = b
d = tf.assign_add(d, c)
e = d
with tf.Session() as sess:
tf.global_variables_initializer().run()
print(sess.run(e))
The issue I have with the above code is as follows: -It prints different values on each run which seems odd -It does not correctly update at each step in the for loop Part of why I am confused is the fact that I understand you have to run the assign operation to update the prior reference, however, I just can't figure out how to correctly do that in each step of the for loop. If there is an easier way I am open to suggestions. This example is the same as how I am currently trying to feed in an array of inputs and get a sum based on each prediction the model makes. If clarification on any of the above would help I will be more than happy to provide it.
The following is the results from running the code above three times.
$ python test3.py
2018-07-03 13:35:08.380077: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
print b: [8]
print b: [8]
print b: [8]
print b: [8]
print b: [8]
print b: [8]
print b: [8]
print b: [8]
print b: [8]
print b: [8]
80
$ python test3.py
2018-07-03 13:35:14.055827: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
print b: [7]
print b: [6]
print b: [6]
print b: [6]
print b: [6]
print b: [6]
print b: [6]
print b: [6]
print b: [6]
print b: [6]
60
$ python test3.py
2018-07-03 13:35:20.120661: I T:\src\github\tensorflow\tensorflow\core\platform\cpu_feature_guard.cc:140] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
print b: [9]
print b: [9]
print b: [9]
print b: [9]
print b: [9]
print b: [9]
print b: [9]
print b: [9]
print b: [9]
print b: [9]
90
The result I am expecting is as follows:
print b: [0]
print b: [1]
print b: [2]
print b: [3]
print b: [4]
print b: [5]
print b: [6]
print b: [7]
print b: [8]
print b: [9]
45
The main reason I am confused is that sometimes it provides all nines which makes me think that it loads the last value assigned 10 times, however, sometimes it loads different values which seems to contrast this theory.
What I would like to do is to feed in an array of input examples and compute the gradient for all examples at the same time. It needs to be concurrently because the reward used is dependent on the outputs of the model, so if the model changes the resulting rewards would also change.