I am working with th new version of keras-rl2, trying to train my DQN agent. I have trouble with the fit function - https://github.com/tensorneko/keras-rl2/blob/master/rl/core.py . This is the documentation for class Agent (line 147 --> env.step()) This method is returning more than 4 values, not sure why. I have trouble working with new and old versions of gym along with keras-rl. Has anyone resolved this issue? If so, please let me know what gym version you used to train the DQN agent, or how to handle the return value for the fit function. You can refer the full code in this question here - AttributeError: 'tuple' object has no attribute '__array_interface__'
pip show gym
Name: gym
Version: 0.26.2
Summary: Gym: A universal API for reinforcement learning environments
Home-page: https://www.gymlibrary.dev/
Author: Gym Community
Author-email: jkterry@umd.edu
License: MIT
Location: /home/harsh/.local/lib/python3.10/site-packages
Requires: cloudpickle, gym-notices, numpy
Required-by:
Note: you may need to restart the kernel to use updated packages.
!git clone https://github.com/wau/keras-rl2.git
!cd /home/'user_name'/keras-rl
env = gym.make("Breakout-v4")
nb_actions = env.action_space.n
pip show keras
Name: keras Version: 2.12.0 Summary: Deep learning for humans. Home-page: https://keras.io/ Author: Keras team Author-email: keras-users@googlegroups.com License: Apache 2.0 Location: /home/harsh/.local/lib/python3.10/site-packages Requires: Required-by: keras-rl, tensorflow Note: you may need to restart the kernel to use updated packages.
# Load the weights
model.load_weights("weights/dqn_BreakoutDeterministic-v4_weights_900000.h5f")
# Update the policy to start with a smaller epsilon
policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=0.3, value_min=.1, value_test=.05,
nb_steps=100000)
# Initialize the DQNAgent with the new model and updated policy and compile it
dqn = DQNAgent(model=model, nb_actions=nb_actions, policy=policy, memory=memory,
processor=processor, nb_steps_warmup=50000, gamma=.99, target_model_update=10000)
dqn.compile(Adam(learning_rate=.00025), metrics=['mae'])
# And train the model
dqn.fit(env, nb_steps=500000, callbacks=[checkpoint_callback], log_interval=10000, visualize=False)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[25], line 15
12 dqn.compile(Adam(learning_rate=.00025), metrics=['mae'])
14 # And train the model
---> 15 dqn.fit(env, nb_steps=500000, callbacks=[checkpoint_callback], log_interval=10000, visualize=False)
File ~/.local/lib/python3.10/site-packages/rl/core.py:177, in Agent.fit(self, env, nb_steps, action_repetition, callbacks, verbose, visualize, nb_max_start_steps, start_step_policy, log_interval, nb_max_episode_steps)
175 for _ in range(action_repetition):
176 callbacks.on_action_begin(action)
--> 177 observation, r, done, info = env.step(action)
178 observation = deepcopy(observation)
179 if self.processor is not None:
ValueError: too many values to unpack (expected 4)