This code :
R = ql.matrix([ [0,0,0,0,1,0],
[0,0,0,1,0,1],
[0,0,100,1,0,0],
[0,1,1,0,1,0],
[1,0,0,1,0,0],
[0,1,0,0,0,0] ])
is from :
R is defined as the "Reward matrix for each state" . What are the states and rewards in this matrix ?
# Reward for state 0
print('R[0,]:' , R[0,])
# Reward for state 0
print('R[1,]:' , R[1,])
prints :
R[0,]: [[0 0 0 0 1 0]]
R[1,]: [[0 0 0 1 0 1]]
Is [0 0 0 0 1 0]
state0 & [0 0 0 1 0 1]
state1 ?