0

I was trying to make RL agent which will be able to play board game called hexxagon https://hexxagon.com/.

At the beggining I had problems with actions space because this game is kinda simillar to checkers, you have to chose tile from which you make move and tile to which you want to make move. Action space which I have difined is [61, 18] because there are 61 tiles and in theory you can make 18 moves from a tile. The problem is that not every move is possible in such action space, because agent may want to make move from tile where he has no pawn or make move to tile which is already occuiped. I came up with an idea to give agent big negative reward every time he makes illegal move, some points for every good move and lot of reward for capturing enemies pawns.

The main problem I currently have is that it has problems with reaching later stages in game. Best he did was about 10 correct moves. Interesting thing is that he reached this peak pretty fast but wasn't able to go further even after couple hours of learning.

The thing I'm trying currently is starting game in random state. Agents pawns are on random tiles and enemies are too. I did this because I thought he was able to make only 10 moves because he was seeing further states of game rather rearly. The problem is now he make only about 2/3 correct moves in each iteration even though I'm teaching him for about 8 hours.

Do you think my approach with big negative reward is good? Should I use some kind of action mask? If so where can I find some tutorial? Will teaching him on randomly generated board give any results later?

I'm using gym for making environment and stable_baselines3 for learning. I'm using PPO algorithm.

0 Answers0