0

I am working on an implementation of Q-Learning to build an ai to play Galaga. I understand that Q-learning requires states and actions, and tables to determine movement between states.

All the examples and tutorials for Q-Learning online seem to be for a grid-based game, with easily defined states. But Galaga involves moving left, right, and shooting upwards, with enemies moving randomly throughout gameplay. So, I'm having trouble defining what my states in the Q-Learning algorithm should be. I've considered having every potential position of the ship to be a state, or perhaps having states dependent on the number of enemies remaining alive. I've even considered having states for every frame of gameplay, but that seems obviously too costly.

I'd appreciate it if anyone with a better understanding of q-learning could help me just define what my states should be. I also understand the necessity for rewards, but I'm not entirely sure what the reward would be on a frame-by-frame basis, since the game score only increases when enemies are killed. Perhaps some function of the gamescore and the framecount.

Thanks for any help!

Anonymous
  • 2,184
  • 15
  • 23
Simon
  • 1
  • 1

1 Answers1

0

The "state" of Galaga at a specific time step must include the location of your agent as well as the locations of your enemies. Your agent won't be able to learn effectively if the only state it is aware of is its own location. Otherwise, how would it learn when to fire? If the enemies spawn and move in the same exact way each game, then the framecount can be used as a way of tracking where the enemies are.

Although Q-learning is guaranteed to solve problems with a finite state space, the state space of this game (all the different possible permutations of your location plus enemy locations) might be too large for vanilla Q-learning to solve in a reasonable amount of time.

One way to use Q-learning to solve problems with large state spaces is to use function approximation. The idea is that you don't treat each unique state as a silo, but try to recognize similarities between states so that you can use experience you gained from seeing a similar state to take an informed action in a state you've never experienced before. Using function approximation, combined with a convolutional neural network, is explained further in the approached outline in DeepMind's famous paper, where they outline an algorithm known as Deep Q-learning or DQN to solve similar Atari games.

R.F. Nelson
  • 2,254
  • 2
  • 12
  • 24