1

I tried to custom environment with a reinforcement learning(RL) project.

Some examples such as ping-pong, Aarti, Super-Mario, in this case, action, and observation space really small.

But, my project action, observation space is really huge size better than some examples.

And, I will use the space for at least 5000+ actions and observations.

Then, how can I effectively handle this massive amount of action and observation?

Currently, I am using Q-table learning, so I use a wrapper function to handle it.

But this seems to be very ineffective.

subspring
  • 690
  • 2
  • 7
  • 23
  • Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. – Community Mar 31 '22 at 17:40

1 Answers1

1

Yes, Q-table learning is quite old and requires extremely huge amount of memory since it stores Q value in a table. In your case, Q-table Learning seems not good enough. A better Choice would be Deep Q Network(DQN), which replaces table by networks, but it is not that efficient.

As for the huge observation space, it is fine. But the action space (5000+) seems too huge, it requires lots of time to converge. To reduce the time used for training, I would recommend PPO.

Yuxuan Xie
  • 51
  • 2
  • thank you for the reply. Your valuable advice is similar to the answer I was looking for. Can you briefly explain PPO? I know DQN, but I'm not familiar with PPO concepts. – youngwoo Oh Mar 25 '22 at 02:18
  • No problem. DQN is a value based method. DQN uses the neural network to take the observation as input and output Q value of each action. Then DQN selects a action based on Q values. While PPO is a policy based method. In PPO the network takes observation as input and output a distribution over actions (we called this distribution policy). PPO trains this policy directly to maximize its long term reward. A more detailed tutorial is https://spinningup.openai.com/en/latest/algorithms/ppo.html. – Yuxuan Xie Mar 25 '22 at 02:37
  • Thanks for your kind reply. It will be of great help to my project. – youngwoo Oh Mar 25 '22 at 04:18