I'm trying to implement an LSTM network to play a game with TensorFlow, but I'm having a very hard time putting pieces together (in other words: to go from this example: A Reccurent Neural Network (LSTM) implementation example using TensorFlow library and MNIST toward what my model need). The game can be described as follow:
- There are 4 different states (s1,s2,s3 and s4). They repeat themselves in cycle through time until the end of the game. Not all of them are necessarily played in a cycle, and a given state can be played several times in succession in a cycle.
Example of a game (cycle are separated by ||): start: s1,s2,s3 || s1 || s1,s2,s3,s4 || s1,s1,s2,s2,s2,s3 || s1,s2 :end) - Each state is represented by a 1D vector. None of the state has the same vector length.
- The actions available to the agent is the same for the 4 states.
- The action in a cycle depends on the current state as well as on all previous states in this cycle (no Markov property) and all previous actions taken.
Here is a diagram representing the model I think may fit this game (inspired by «Karpathy, The Unreasonable Effectiveness of RNN, many to many example»): Network's model picture.
Problems (implementation differences with the MNIST example cited above):
- How to deal with the different size of input vectors for the different cells? Knowing that the information encoded at a given input vector index does not represent the same stuff compared to the other input vector (true for all input vector).
- How to deal with the unknown length of cycle, and the repetition of state that may or may not occur in a cycle?