I'm using neural network and tensorflow to for reinforcement learning on various stuff with Q learning method, and I want to know what is the solution to reduce the outputs possibilities when a specific action corresponding to a specific output isn't realisable in the environment at a specific state.
For example, my network is learning to play a game in which 4 actions are performed. But there is a specific state in which action 1 isn't performable in the environment but my neural network Q values indicate me that action 1 is the best thing to do. What do I have to do in this situation?
(Is just chosing a random valid action the best way to counter this problem ?)