0

1. Actor critic model

using tf.gradienttape,

loss_function = mse or huber or cross-entropy

y_true=constant,
y_pred=my_network_output, e.g. y_pred = my_netword(input)

e.g. loss_actor = tf.losses.MSE(y_true, y_pred)
or other such of things, like

loss_actor = Huber(y_true, action_probs) 

loss_actor = cross_entropy(y_true, y_pred)

intention, y_true = constant, is what my network convergents to

y_pred = my_network(input) 

2. The problem
ultimately, I condense my problem as below
y_true, I use artificial data (fake data)

if n < 130:
    self.ret = 0.1
elif n >= 130:
    self.ret = -0.1

where, n starts from 124, and n eventually go to inf

here, self.ret is my y_true, my label

I want, when I feed self.ret, i.e., y_true, = 0.1, my network output [0.0, 1.0] which represent Invest

when I feed self.ret, i.e., y_true, = (- 0.1), my network output [1.0, 0.0] which represent Uninvest

3. Feed Network data as random input

  • when I feed network input as random data, this model, work well
  • when n < 130, my_network output [0.0, 1.0], which represent invest
  • when n > 130, my_network output [1.0, 0.0], which represent uninvest

but when I feed network input as real stock data, this model, work wrong

  • when n > 130, my_network output [0.0, 1.0], for ever, which represent invest, but I want [1.0, 0.0], i.e. uninvest

4. Wrong use of tf.gradienttape

I know, problem is in that I use tf.gradienttape in wrong way
tf.gradienttape did not calculate the gradient correctly

But, I, exactly, want to know, how to change my code, to a correct one

my code is:

if n < 130:
        self.ret = 0.1
elif n >= 130:
        self.ret = -0.1
# n starts from 124, self.ret is y_true
with tf.GradientTape(persistent=True) as tape:
        tape.watch(self.actor.trainable_variables)
        #y_pred = action_probs = self.actor(self.get_input(n))[0]
        action_probs = self.actor(self.get_input(n))[0] # i.e. y_pred
        #''' # below use huber or mse as loss func
        y_true = tf.nn.softmax([0.0, 1e2] * tf.stop_gradient(self.ret))
        #loss_actor = tf.losses.MSE(y_true, action_probs)
        huber = tf.keras.losses.Huber()
        loss_actor = huber(y_true, action_probs)
        #'''
        ''' # below use cross-entropy as loss func
        r_t = self.ret
        delta_t = 1.0
        prediction = tf.keras.backend.clip(tf.nn.softmax([0.0, 1e2] * tf.stop_gradient(self.ret if NO_CRITIC else r_t)), eps, 1 - eps)
        log_probabilities = action_probs * tf.keras.backend.log(prediction)
        # self.ret or r_t is y_true
        loss_actor = tf.keras.backend.sum(-log_probabilities * tf.stop_gradient(delta_t))
        #'''
loss_actor_gradients = tape.gradient(loss_actor, self.actor.trainable_variables)
self.opt_actor.apply_gradients(zip(loss_actor_gradients, self.actor.trainable_variables))
Yu Bo
  • 1
  • 2
  • I find a way, to avoid this problem self.actor.compile(optimizer=self.opt_actor, loss=loss_func_mse1) hist = self.actor.fit(self.get_input(n), y_true, epochs=1) – Yu Bo Jun 29 '21 at 13:41
  • but I still want to know, how to solve this problem. ie, how to define own loss func. and calculate the gradient. and apply the gradient to model.trainable_variables – Yu Bo Jun 29 '21 at 13:43

0 Answers0