3

I am at the moment trying to implement a AI player using Q-learning to play against 2 different random players..

I am not sure Q-learning is applicable for a ludo game, which why I am being bit doubtful about it..

I have for the game defined 11 states. Each state is defined according to the position of the other players.

My possible actions is 6, (constrained by the dice).

Theoretically i could have four different states (One for each Ludo Token) Which can perform the action chosen by the dice, but I would just choose to move the token which has the highest Q(s,a) and peform the action..

What i don't get is, what will happen at the update phase.

I understand I update the previous value, with the new value?..

Based from wiki is the update given as this:

enter image description here

What I don't get is how the reward value is different from the old value? How is it defined and how is it different for those values in the matrix?

Lamda
  • 914
  • 3
  • 13
  • 39

1 Answers1

0

The reward is the reward given for making a certain move, and the old q-value is the the value in the q-table that was chosen as the action, was the most attractive in the given state. The reward here will update that entry, such that the algorithm will in the future know if either the move was benefitted or made the outcome worser.

Lamda
  • 914
  • 3
  • 13
  • 39