I am working on an implementation of Q-Learning
to build an ai to play Galaga.
I understand that Q-learning
requires states and actions, and tables to determine movement between states.
All the examples and tutorials for Q-Learning
online seem to be for a grid-based game, with easily defined states. But Galaga involves moving left, right, and shooting upwards, with enemies moving randomly throughout gameplay. So, I'm having trouble defining what my states in the Q-Learning
algorithm should be. I've considered having every potential position of the ship to be a state, or perhaps having states dependent on the number of enemies remaining alive. I've even considered having states for every frame of gameplay, but that seems obviously too costly.
I'd appreciate it if anyone with a better understanding of q-learning
could help me just define what my states should be. I also understand the necessity for rewards, but I'm not entirely sure what the reward would be on a frame-by-frame basis, since the game score only increases when enemies are killed. Perhaps some function of the gamescore and the framecount.
Thanks for any help!