so I'm sure everyone has heard about the Berkeley Pac-Man AI challenge at some point or another. A while ago I created a 2D platformer (doesn't scroll) and figured it would be pretty cool to take some inspiration from this project but create an AI for my game (instead of PacMan). That being said, I have found myself very stuck. I have looked at several GitHub solutions for the PacMan as well as plenty of articles regarding implementation of MDP / Reinforced learning in Python. I'm having a hard time relating them back to my game.
In my game, I have 10 levels. Each level has fruit, and once the agent grabs all the fruit, he completes the level and the next level starts. Here's an example stage:
As you can see in this picture, my agent is the little squirrel and he has to grab all the cherries. On the ground there's also spikes that he can't walk on (or you lose a life). You can avoid spikes by jumping. So a jump technically moves the agent 2 spaces to the left or right (depending on way he's facing). Other than that you can move left, right, up and down on a ladder. You can't jump more than 1 space, so the "2+ gappers" at the top you see, you'd have to climb down the ladder and go around. Additionally, not pictured above there are enemies that go Left and Right only on a floor that you have to dodge (you can jump over them, or just avoid them). They're tracked on a grid I'll talk about below. So that's a bit about the game, if you need anymore clarifications, feel free to ask and I can assist, let me go into what I've tried now a little bit and see if someone can assist me with getting something together.
In my code I have a grid that has all the spaces on the map and what they are (platform, regular spot, spike, reward/fruit, ladder, etc). Using that, I created an action grid (code below) that basically stores in a dictionary all of the spots the agent can move from each location.
for r in range(len(state_grid)):
for c in range(len(state_grid[r])):
if(r == 9 or r == 6 or r == 3 or r == 0):
if (move_grid[r][c] != 6):
actions.update({state_grid[r][c] : 'None'})
else:
actions.update({state_grid[r][c] : [('Down', 'Up')]})
elif move_grid[r][c] == 4 or move_grid[r][c] == 0 or move_grid[r][c] == 2 or move_grid[r][c] == 3 or move_grid[r][c] == 5:
actions.update({state_grid[r][c] : 'None'})
elif move_grid[r][c] == 6:
if move_grid[r+1][c] == 4 or move_grid[r+1][c] == 2 or move_grid[r+1][c] == 3 or move_grid[r+1][c] == 5:
if c > 0 and c < 18:
actions.update({state_grid[r][c] : [('Left', 'Right', 'Up', 'Jump Left', 'Jump Right')]})
elif c == 0:
actions.update({state_grid[r][c] : [('Right', 'Up', 'Jump Right')]})
elif c == 18:
actions.update({state_grid[r][c] : [('Left', 'Up', 'Jump Left')]})
elif move_grid[r+1][c] == 6:
actions.update({state_grid[r][c] : [('Down', 'Up')]})
elif move_grid[r][c] == 1 or move_grid[r][c] == 8 or move_grid[r][c] == 9 or move_grid[r][c] == 10:
if c > 0 and c < 18:
if move_grid[r+1][c] == 6:
actions.update({state_grid[r][c] : [('Down', 'Left', 'Right', 'Jump Left', 'Jump Right')]})
else:
actions.update({state_grid[r][c] : [('Left', 'Right', 'Jump Left', 'Jump Right')]})
elif c == 0:
if move_grid[r+1][c] == 6:
actions.update({state_grid[r][c] : [('Down', 'Right', 'Jump Right')]})
else:
actions.update({state_grid[r][c] : [('Right', 'Jump Right')]})
elif c == 18:
if move_grid[r+1][c] == 6:
actions.update({state_grid[r][c] : [('Down', 'Left', 'Jump Left')]})
else:
actions.update({state_grid[r][c] : [('Left', 'Jump Left')]})
elif move_grid[r][c] == 7:
actions.update({state_grid[r][c] : 'Spike'})
else:
actions.update({state_grid[r][c] : 'WTF'})
At this point I'm kind of just stuck on how/what is needed to send to the MDP. I'm using the Berkeley MDP and I'm just super stuck on how to start getting it involved and implemented. I have a bunch of the data points and where stuff is, just not sure how to actually get the ball rolling.
I have a boolean grid that tracks all the harmful objects (spikes and enemies) and is updated constantly since the enemies move every second.
I created a reward grid that sets:
- Spikes "-5" reward
- Falling off the map "-5" reward
- Fruit is a "5" reward
- Everything else is a "-0.2" reward (since you want to optimize steps, not trying to be on level 1 all day).
Another part of my issue during researching solutions or ways to implement this is that most of the solutions are cars driving to a position. So they only have 1 reward position whereas mine has multiple fruits per stage. Yeah I'm just super stuck and getting frustrated with this. Wanted to try my own thing but if I can't do this I might as well just do the Pac-Man one since there's several online solutions. I appreciate your time and help with this!
Edit: So here's me trying to make an example call get the move grid back, although from my understanding (and obviously) it will dynamically change after each step the agent makes since the enemy will threaten certain spots and all.
This is the result, you can see like it's close but obviously a little stuck up on the fact that there's multiple rewards. I feel like I might be close, but I'm not really sure. I'm a little frustrated at this point.