1

I have this piece of code and I can find out where the mistake is coming from

boxes=(2,2,4,2)
action=(0,1)
num_a=2
Q_table = np.zeros(boxes+(num_a,))
if (pre_a != -1):    
  if (s == -1):        
     bestQ = 0        
  else:
     bestQ=np.amax(Q_table[s])
  Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a])

if (s==-1):   
            R=-100
            bestQ=0
            print("failure")
            print(pre_s,pre_a)
            Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a])
            print("RESETTING!!!!!")
            [pre_s, s, pre_a, a, x, x_dot, theta, theta_dot] = reset_cart(beta)
            resets= resets + 1
            success = 0
      else:
           R=10
           success=success + 1
           bestQ=np.amax(Q_table[s])
           #Q_table[s+(pre_a,)]+=alpha*(R+gamma*bestQ-Q_table[s+(pre_a,)])
           Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a])

When I run this, I get the following error:

IndexError: index 2 is out of bounds for axis 0 with size 2

But sometimes the code works fine, and some other times this error pops up.

Please can anyone debug this.

Narendra Jadhav
  • 10,052
  • 15
  • 33
  • 44
Stevy KUIMI
  • 47
  • 2
  • 6
  • Which line does it fail on? Does it fail when you use different data? Could you provide inputs that it works on and doesn't? – c2huc2hu Jun 13 '18 at 17:15
  • Hi Narenda, It fails on this line: Q_table[pre_s,pre_a]+=alpha*(R+gamma*bestQ-Q_table[pre_s,pre_a]). It works for boxes=(2,2,2,2) but doesn't work for (1,1,1,1) and (2,2,4,2) or even (3,3,6,3). – Stevy KUIMI Jun 14 '18 at 02:23

0 Answers0