I am trying to setup a reinforcement learning project using Gym & kears_rl.
Description:
Given a numbers in a range (100, 200)
, I want the agent to alert me when a number is close to the limits, lets say between 0%-10%
and 90%-100%
of the quantiles.
Reward:
sample in quantile (0, 0.1), reward is (+1)
sample in quantile (0.1, 0.9), reward is (-1)
sample in quantile (0.9, 1), reward is (+1)
The agent need to learn the 10% & 90% limits values.
low = np.array([100])
high = np.array([200])
self.action_space = spaces.Box(low=low, high=high, dtype=np.intt32)
self.observation_space = spaces.Box(low=low, high=high, dtype=np.int32)
main.py info:
if __name__ == '__main__':
env = Env(max_steps=100)
nb_actions = env.action_space.shape[0] # equal to 100
model = Sequential()
model.add(Flatten(input_shape=(1,) + env.observation_space.shape))
model.add(Dense(8, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(2, activation='softmax'))
memory = SequentialMemory(limit=50000, window_length=1)
policy = BoltzmannQPolicy()
# Create DQN agent
dqn = DQNAgent(model=env.model,
memory=memory,
policy=policy,
nb_actions=nb_actions,
nb_steps_warmup=10,
target_model_update=1e-2)
# Compile the DQN agent
dqn.compile(Adam(lr=1e-3), metrics=['mae'])
# Okay, now it's time to learn something!
dqn.fit(env, nb_steps=50000, visualize=False, verbose=1)
Questions\issues:
On the fit function (rl/core.py:169), the action I get equal to zero. it should be between [100, 200]. Why is that? I expect the action to be within the action_space but I see that all policies return a value of 0 or 1. How I suppose to use the value in the env.step() func?
my code is based on the following examples:
OpenAI-Gym hottercolder.py environment
Any help is much appreciated.
Thanks.