1. Actor critic model
using tf.gradienttape,
loss_function = mse or huber or cross-entropy
y_true=constant
,
y_pred=my_network_output
, e.g. y_pred = my_netword(input)
e.g. loss_actor = tf.losses.MSE(y_true, y_pred)
or other such of things, like
loss_actor = Huber(y_true, action_probs)
loss_actor = cross_entropy(y_true, y_pred)
intention, y_true = constant
, is what my network convergents to
y_pred = my_network(input)
2. The problem
ultimately, I condense my problem as below
y_true, I use artificial data (fake data)
if n < 130:
self.ret = 0.1
elif n >= 130:
self.ret = -0.1
where, n starts from 124, and n eventually go to inf
here, self.ret is my y_true, my label
I want, when I feed self.ret, i.e., y_true, = 0.1
, my network output [0.0, 1.0] which represent Invest
when I feed self.ret, i.e., y_true, = (- 0.1)
, my network output [1.0, 0.0] which represent Uninvest
3. Feed Network data as random input
- when I feed network input as random data, this model, work well
- when n < 130, my_network output [0.0, 1.0], which represent invest
- when n > 130, my_network output [1.0, 0.0], which represent uninvest
but when I feed network input as real stock data, this model, work wrong
- when n > 130, my_network output [0.0, 1.0], for ever, which represent invest, but I want [1.0, 0.0], i.e. uninvest
4. Wrong use of tf.gradienttape
I know, problem is in that I use tf.gradienttape in wrong way
tf.gradienttape did not calculate the gradient correctly
But, I, exactly, want to know, how to change my code, to a correct one
my code is:
if n < 130:
self.ret = 0.1
elif n >= 130:
self.ret = -0.1
# n starts from 124, self.ret is y_true
with tf.GradientTape(persistent=True) as tape:
tape.watch(self.actor.trainable_variables)
#y_pred = action_probs = self.actor(self.get_input(n))[0]
action_probs = self.actor(self.get_input(n))[0] # i.e. y_pred
#''' # below use huber or mse as loss func
y_true = tf.nn.softmax([0.0, 1e2] * tf.stop_gradient(self.ret))
#loss_actor = tf.losses.MSE(y_true, action_probs)
huber = tf.keras.losses.Huber()
loss_actor = huber(y_true, action_probs)
#'''
''' # below use cross-entropy as loss func
r_t = self.ret
delta_t = 1.0
prediction = tf.keras.backend.clip(tf.nn.softmax([0.0, 1e2] * tf.stop_gradient(self.ret if NO_CRITIC else r_t)), eps, 1 - eps)
log_probabilities = action_probs * tf.keras.backend.log(prediction)
# self.ret or r_t is y_true
loss_actor = tf.keras.backend.sum(-log_probabilities * tf.stop_gradient(delta_t))
#'''
loss_actor_gradients = tape.gradient(loss_actor, self.actor.trainable_variables)
self.opt_actor.apply_gradients(zip(loss_actor_gradients, self.actor.trainable_variables))