I have this piece of code and I can find out where the mistake is coming from
boxes=(2,2,4,2)
action=(0,1)
num_a=2
Q_table = np.zeros(boxes+(num_a,))
if (pre_a != -1):
if (s == -1):
bestQ = 0
else:
bestQ=np.amax(Q_table[s])
Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a])
if (s==-1):
R=-100
bestQ=0
print("failure")
print(pre_s,pre_a)
Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a])
print("RESETTING!!!!!")
[pre_s, s, pre_a, a, x, x_dot, theta, theta_dot] = reset_cart(beta)
resets= resets + 1
success = 0
else:
R=10
success=success + 1
bestQ=np.amax(Q_table[s])
#Q_table[s+(pre_a,)]+=alpha*(R+gamma*bestQ-Q_table[s+(pre_a,)])
Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a])
When I run this, I get the following error:
IndexError: index 2 is out of bounds for axis 0 with size 2
But sometimes the code works fine, and some other times this error pops up.
Please can anyone debug this.