2

I use Q-learning in order to determine the optimal path of an agent. I know in advance that my path is composed of exactly 3 states (so after 3 states I reach a terminal state). I would like to know how to include that in the updating rule of the q-function.

What I am doing currently:

for t=1:Nb_Epochs-1       
    if rand(1)<Epsilon
        an action 'a' is chosen  at random             
    else
        [Maxi a]=max(QIter(CurrentState,:,t)); 
    end

    NextState=FindNextState(CurrentState,a);
    QIter(CurrentState,a,t+1)=(1-LearningRate)*QIter(CurrentState,a,t)+LearningRate*(Reward(NexState)+Discount*max(QIter(NextState,:,t)));
    CurrentState=NextState;
end
karel
  • 5,489
  • 46
  • 45
  • 50
Hajar Elhammouti
  • 103
  • 2
  • 10

0 Answers0