I am making AI like alpha GO using DQN. BUT i am in trouble with teaching game rules. AI doesn't know a rule that 'must not put stone into a place that is already occupied' in the first time. I tried to give minus reward whenever AI violates that rule. but it doesn't look like that AI learn the rule. I think that teaching rules is just waste of time. Please share your idea with me.
Asked
Active
Viewed 86 times
1 Answers
0
What you can do is that when you are in a state 's' and you have for exemple 8 possibles actions (so 8 outputs for your network) but the actions 1 2 3 aren't performable you can minimize the loss with target Q values manually set to 0 for all invalids actions in the state 's'.
And for the training step, when the action corresponding to the biggest Q value is invalid just select a random action and don't forget to set the target Q value for this action to 0.

Xeyes
- 583
- 5
- 25