0

I have implemented a custom openai gym environment for a game similar to http://curvefever.io/, but with discreet actions instead of continuous. So my agent can in each step go in one of four directions, left/up/right/down. However one of these actions will always lead to the agent crashing into itself, since it cant "reverse".

Currently I just let the agent take any move, and just let it die if it makes an invalid move, hoping that it will eventually learn to not take that action in that state. I have however read that one can set the probabilities for making an illegal move zero, and then sample an action. Is there any other way to tackle this problem?

ericwenn
  • 106
  • 1
  • 10

1 Answers1

1

You can try to solve this by 2 changes:

1: give current direction as an input and give reward of maybe +0.1 if it takes the move which does not make it crash, and give -0.7 if it make a backward move which directly make it crash.

2: If you are using neural network and Softmax function as activation function of last layer, multiply all outputs of neural network with a positive integer ( confidence ) before giving it to Softmax function. it can be in range of 0 to 100 as i have experience more than 100 will not affect much. more the integer is the more confidence the agent will have to take action for a given state.

If you are not using neural network or say, deep learning, I suggest you to learn concepts of deep learning as your environment of game seems complex and a neural network will give best results.

Note: It will take huge amount of time. so you have to wait enough to train the algorithm. i suggest you not to hurry and let it train. and i played the game, its really interesting :) my wishes to make AI for the game :)

Jay Joshi
  • 868
  • 8
  • 24
  • Thanks. As the AI was for a competition that finished last week, and I couldnt get it to work I gave up, heh. I was using A3C, so no softmax layer as output. When I looked at the gameplay I noticed that it did not actually make an invalid move that often (since doing so would make it die immediately, hence would not be a good move). I think where I failed was cleaning up the input and also a too high gamma value. It was set at .95 when I trained it for the longest. – ericwenn Nov 07 '17 at 14:42
  • It was a competition by Cygni, you can find it here http://game.snake.cygni.se/#/?_k=kdwudn – ericwenn Nov 07 '17 at 18:22