so i have this environment and reward design that i specially designed to be close to -1, 0 and 1, so (as i got told) the sigmoid wouldn't saturate, also i kept the reward design fairly simple with ca -1 and 1 reward for end goal.
so i am using DDPG and i use 250 neurons (varies, testing a lot, but let's stick with this number for now) for my hidden layer. Lr = 0.001, Memory size = 300, Gamma = 0.9, Epsilon = 0.18.
so this is my actor network:
def _build_a(self, s, reuse=None, custom_getter=None):
trainable = True if reuse is None else False
with tf.variable_scope('Actor', reuse=reuse, custom_getter=custom_getter):
net = tf.layers.dense(s, 250, activation=tf.nn.tanh, name='l1', trainable=trainable)
a = tf.layers.dense(net, 3, name='a', trainable=trainable)
# return tf.nn.softmax(a , name='scaled_a')
return tf.nn.sigmoid(a, name='scaled_a')
and this is my critic network
def _build_c(self, s, a, reuse=None, custom_getter=None):
trainable = True if reuse is None else False
with tf.variable_scope('Critic', reuse=reuse, custom_getter=custom_getter):
n_l1 = s.shape[1]
w1_s = tf.get_variable('w1_s', [s.get_shape()[1], n_l1], trainable=trainable)
w1_a = tf.get_variable('w1_a', [3, n_l1], trainable=trainable)
b1 = tf.get_variable('b1', [1, n_l1], trainable=trainable)
net = tf.nn.relu(tf.matmul(s, w1_s) + tf.matmul(a, w1_a) + b1)
return tf.layers.dense(net, 1, trainable=trainable) # Q(s,a)
as stated before, my rewards consist around -1 and 1 and my state looks like this (part is hotencoded also):
[ 0. 1. 0. -0.57726974 0.45491466 2.04893833
-0.7697888 -0.57952472 -0.57726974 -0.44017265 -0.94382348 1.38399613]
my td error is very low, since i (think cause i) preproccesed everything down, causing the moving volume of the values to be low. Does anybody have an idea why my sigmoid saturates, is it my network or my state that is not good? I would really like to know, cause everything i tried so far didn't worked out. it either saturates to 1 action (3 = action_bound) to 0.999 and the rest around 0 or it converges to all 0.999 and also i had a run where everything went to 0. Coding currently in the latest Python and Tensorflow version.
Thanks in advance for answering, that means a lot to me!
~Jan
PS: If i missed any information that is needed, let me know.