I'm trying to come up with a better representation for the state of a 2-d grid world for a Q-learning algorithm which utilizes a neural network for the Q-function.
In the tutorial, Q-learning with Neural Networks, the grid is represented as a 3-d array of integers (0 or 1). The first and second dimensions represent the position of an object in the grid world. The third dimension encodes which object it is.
So, for a 4x4 grid with 4 objects in it, you would represent the state with a 3-d array with 64 elements in it (4x4x4). This means that the meural network would have 64 nodes in the input layer so it could accept the state of the grid world as input.
I want to reduce the number of nodes in the Neural Network so that training does not take as long. So, can you represent the grid world as 2-d array of doubles instead?
I tried to represent a 4x4 grid world as a 2-d array of doubles and used different values to represent different objects. For example, I used 0.1 to represent the player and 0.4 to represent the goal. However, when I implemented this the algorithm stopped learning at all.
Right now I think my problem might be that I need to change which activation functions I'm using in my layers. I'm presently using the hyperbolic tangent activation function. My inputs values range from (0 - 1). My output values range from (-1 to 1). I've also tried the sigmoid function.
I realize this is a complex problem to be asking a question about. Any suggestion as to architecture of the network would be appreciated.
UPDATE
There are three variants to the game: 1. The world is static. All objects start in the same place. 2. The player starting position is random. All other objects stay the same. 3. Each grid is totally random.
With more testing I discovered I can complete the first two variants with my 2d array representation. So I think my network architecture might be fine. What I discovered is that my network is now extraordinarily susceptible to catastrophic forgetting (much more so than when I was using the 3d array). I have to use "experience replay" to make it learn, but even then I still can't complete the third variant. I'll keep trying. I'm rather shocked how much of a difference changing the grid world representation made. It hasn't improved performance at all.