I'm new to reinforcement learning and q-learning and I'm trying to understand concepts and try to implement them. Most of material I have found use CNN layers to process image input. I think I would rather start with something simpler than than, so I use grid world.
This is what I have already implemented. I implemented an environment by following MDP and have 5x5 grid, with fixed agent position (A) and target position (T). Start state could look like this.
-----
---T-
-----
-----
A----
Currently I represent my state as a 1-dimensional vector of length 25 (5x5) where 1 is on position where Agent is, otherwise 0, so for example the state above will be repsented as vector
[1, 0, 0, ..., 0]
I have successfully implemented solutions with Q table and simple NN with no hidden layer.
Now, I want to move little further and make the task more complicated by making Target position random each episode. Because now there is no correlation between my current representation of state and actions, my agents act randomly. In order to solve my problem, first I need to adjust my state representation to contain some information like distance to the target, direction or both. The problem is, that I don't how to represent my state now. I have come up with some ideas:
- [x, y, distance_T]
- [distance_T]
two 5x5 vectors, one for Agent's position, one for Target's position
[1, 0, 0, ..., 0], [0, 0, ..., 1, 0, ..., 0]
I know that even if I will figure out the state representation, my implemented model will not be able to solve the problem and I will need to move toward hidden layers, experience replay, frozen target network and so on, but I only want to verify the model failure.
In conclusion, I want to ask how to represent such state as an input for neural network. If there are any sources of informations, articles, papers etc. which I have missed, feel free to post them.
Thank you in advance.