Q-learning: How to include a terminal state in updating rule?

Asked Dec 17 '19 at 07:40

Active Dec 17 '19 at 07:57

Viewed 139 times

I use Q-learning in order to determine the optimal path of an agent. I know in advance that my path is composed of exactly 3 states (so after 3 states I reach a terminal state). I would like to know how to include that in the updating rule of the q-function.

What I am doing currently:

for t=1:Nb_Epochs-1       
    if rand(1)<Epsilon
        an action 'a' is chosen  at random             
    else
        [Maxi a]=max(QIter(CurrentState,:,t)); 
    end

    NextState=FindNextState(CurrentState,a);
    QIter(CurrentState,a,t+1)=(1-LearningRate)*QIter(CurrentState,a,t)+LearningRate*(Reward(NexState)+Discount*max(QIter(NextState,:,t)));
    CurrentState=NextState;
end

edited Dec 17 '19 at 07:57

karel

5,489
46
45
50

asked Dec 17 '19 at 07:40

Hajar Elhammouti

Q-learning: How to include a terminal state in updating rule?

0 Answers0