1

I am making AI like alpha GO using DQN. BUT i am in trouble with teaching game rules. AI doesn't know a rule that 'must not put stone into a place that is already occupied' in the first time. I tried to give minus reward whenever AI violates that rule. but it doesn't look like that AI learn the rule. I think that teaching rules is just waste of time. Please share your idea with me.

장영연
  • 11
  • 1

1 Answers1

0

What you can do is that when you are in a state 's' and you have for exemple 8 possibles actions (so 8 outputs for your network) but the actions 1 2 3 aren't performable you can minimize the loss with target Q values manually set to 0 for all invalids actions in the state 's'.

And for the training step, when the action corresponding to the biggest Q value is invalid just select a random action and don't forget to set the target Q value for this action to 0.

Xeyes
  • 583
  • 5
  • 25