I understand that with a fully observable environment (chess / go etc) you can run an MCTS with an optimal policy network for future planning purposes. This will allow you to pick actions for game play, which will result in max expected return from that state.
However, in a partially observable environment, do we still need to run MCTS during game play? Why can't we just pick the max action from the trained optimal policy given the current state? What utility does MCTS serve here
I am new to reinforcement learning and am trying to understand the purpose of MCTS / planning in partially observable environments.