2

I am trying to setup a reinforcement learning project using Gym & kears_rl.

Description:

Given a numbers in a range (100, 200), I want the agent to alert me when a number is close to the limits, lets say between 0%-10% and 90%-100% of the quantiles.

Reward:

sample in quantile (0, 0.1), reward is (+1)

sample in quantile (0.1, 0.9), reward is (-1)

sample in quantile (0.9, 1), reward is (+1)

The agent need to learn the 10% & 90% limits values.

low = np.array([100])
high = np.array([200])
self.action_space = spaces.Box(low=low, high=high, dtype=np.intt32)
self.observation_space = spaces.Box(low=low, high=high, dtype=np.int32)

main.py info:

if __name__ == '__main__':
        env = Env(max_steps=100)

        nb_actions = env.action_space.shape[0]  # equal to 100

        model = Sequential()
        model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
        model.add(Dense(8, activation='relu'))
        model.add(Dense(8, activation='relu'))
        model.add(Dense(2, activation='softmax'))

        memory = SequentialMemory(limit=50000, window_length=1)
        policy = BoltzmannQPolicy()

        # Create DQN agent
        dqn = DQNAgent(model=env.model,
                       memory=memory,
                       policy=policy,
                       nb_actions=nb_actions,
                       nb_steps_warmup=10,
                       target_model_update=1e-2)

        # Compile the DQN agent
        dqn.compile(Adam(lr=1e-3), metrics=['mae'])

        # Okay, now it's time to learn something!
        dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)

Questions\issues:

On the fit function (rl/core.py:169), the action I get equal to zero. it should be between [100, 200]. Why is that? I expect the action to be within the action_space but I see that all policies return a value of 0 or 1. How I suppose to use the value in the env.step() func?

my code is based on the following examples:

OpenAI-Gym cartpole.py

keras_rl dqn_cartpole.py

OpenAI-Gym hottercolder.py environment

Any help is much appreciated.

Thanks.

0 Answers0