I am at the moment trying to implement a AI player using Q-learning to play against 2 different random players..
I am not sure Q-learning is applicable for a ludo game, which why I am being bit doubtful about it..
I have for the game defined 11 states. Each state is defined according to the position of the other players.
My possible actions is 6, (constrained by the dice).
Theoretically i could have four different states (One for each Ludo Token) Which can perform the action chosen by the dice, but I would just choose to move the token which has the highest Q(s,a) and peform the action..
What i don't get is, what will happen at the update phase.
I understand I update the previous value, with the new value?..
Based from wiki is the update given as this:
What I don't get is how the reward value is different from the old value? How is it defined and how is it different for those values in the matrix?