0

I am trying to implement DDPG from Keras RL and have the following actor network.

actor = Sequential()
actor.add(Flatten(input_shape=(1,) + env.observation_space.shape))
actor.add(Dense(16))
actor.add(Activation('relu'))
actor.add(Dense(16))
actor.add(Activation('relu'))
actor.add(Dense(16))
actor.add(Activation('relu'))
actor.add(Dense(nb_actions))
actor.add(Activation('linear'))

However, I would prefer to have the output scaled to a custom gym environment action space bounds for my problem. env.action_space.

https://pemami4911.github.io/blog/2016/08/21/ddpg-rl.html shows this using the tflearn api where they use

def create_actor_network(self):
        inputs = tflearn.input_data(shape=[None, self.s_dim])
        net = tflearn.fully_connected(inputs, 400)
        net = tflearn.layers.normalization.batch_normalization(net)
        net = tflearn.activations.relu(net)
        net = tflearn.fully_connected(net, 300)
        net = tflearn.layers.normalization.batch_normalization(net)
        net = tflearn.activations.relu(net)
        # Final layer weights are init to Uniform[-3e-3, 3e-3]
        w_init = tflearn.initializations.uniform(minval=-0.003, maxval=0.003)
        out = tflearn.fully_connected(
            net, self.a_dim, activation='tanh', weights_init=w_init)
        # Scale output to -action_bound to action_bound
        scaled_out = tf.multiply(out, self.action_bound)
        return inputs, out, scaled_out

What is the equivalent command for scaling the output layer according to my requirements?

Milo Lu
  • 3,176
  • 3
  • 35
  • 46
CS101
  • 444
  • 1
  • 6
  • 21
  • I do not want to write an answer but I realized I have to use a squashing function like tanh or sigmoid to map the network output to a range. Then define a function which will map the value from -1/+1 or 0/1 to the gym environment bounds. This is a more apt method for me since my custom gym environment bounds are not centered around 0 – CS101 Nov 21 '18 at 01:58
  • If my arguement is correct, can anyone tell me what kind of mapping should I define in my defined function. Shoulld I use proportional relationship between -1/+1 and my environment bounds or something more complex? – CS101 Nov 21 '18 at 01:59

0 Answers0