1

I understand that with a fully observable environment (chess / go etc) you can run an MCTS with an optimal policy network for future planning purposes. This will allow you to pick actions for game play, which will result in max expected return from that state.

However, in a partially observable environment, do we still need to run MCTS during game play? Why can't we just pick the max action from the trained optimal policy given the current state? What utility does MCTS serve here

I am new to reinforcement learning and am trying to understand the purpose of MCTS / planning in partially observable environments.

  • Hi. Please, ask this question on [Artificial Intelligence SE](https://ai.stackexchange.com/). This question is off-topic here. Stack Overflow is for programming issues. – nbro Apr 22 '20 at 11:14
  • Oh okay. Thank you. – Yohahn Ribeiro Apr 22 '20 at 12:11
  • Question has been moved to: https://ai.stackexchange.com/questions/20546/is-monte-carlo-tree-search-needed-in-partially-observable-environments-during-ga for anyone interested. – Yohahn Ribeiro Apr 23 '20 at 09:20
  • You should delete this question from here now. Note that it's not guaranteed you will get an answer. – nbro Apr 23 '20 at 11:23

0 Answers0