0

I have a QP problem in my reinforcement learning system, which calculates the safe action with the minimium distance between original action. This is the background.

Here is part of my code :

    P = np.matrix([[1.,0.,0.,0.],[0.,1.,0.,0.],[0.,0.,1.,0.],[0.,0.,0.,1.]])
    q = np.array(-u0).flatten() # 
    G = gx
    h = np.array(c - fx).flatten()
    lower_bound = np.array([0., -140., -200 * self.Pgs_M, 0.])
    upper_bound = np.array([143., 140., 200 * self.Pgs_M, 0.])
    safe_action = solve_qp(P,q,G,h,lb=lower_bound, ub=upper_bound)

The u0 is a matrix with the shape of [4,1], and my formulation is correct, G and h can be calculated through u0, self.Pgs_M is a constant number. Meanswhile, I considered the condition that solve_qp returns None.

When my reinforcement learning training working, it might get stuck. I print the control input and initial state before my programm get stuck. It shows that:

===============debug================
u0 =  [[  38.28203142]
      [-140.        ]
      [-144.34985435]
      [   0.        ]]
ini_spd =  [[   0.        ]
           [   0.        ]
           [-203.67992371]
           [ 203.67992371]
           [   0.        ]
           [  -0.        ]]

So I use these input to check my QP problem solving programm, it actually worked, and returned None, because this problem cannot be solved.

===============debug================
u0 =  [[  38.28203142]
      [-140.        ]
      [-144.34985435]
      [   0.        ]]
ini_spd =  [[ 0.00000000e+00]
           [ 0.00000000e+00]
           [-2.03679924e+10]
           [ 2.03679924e+10]
           [ 0.00000000e+00]
           [-0.00000000e+00]]
safe_aciton = None

I was wondering why my programm might get stuck, even if I have already considered the None return. What factor will influence the qpslovers and python programm.

Zhihan_Li
  • 33
  • 4
  • What are the values of `self.Pgs_M`, `G` and `h` when the solver fails? – Felix Crazzolara Jul 12 '22 at 06:50
  • self.Pgs_M is a constant number, G and h is defined by u0. I update the information now. – Zhihan_Li Jul 12 '22 at 09:20
  • I'm not sure what you're exact question is, but it seems as if you're bounds are infeasible. Otherwise it should be able to solve this QP. You might want to check whether some of the lower bounds are higher than their upper counterparts. – Felix Crazzolara Jul 12 '22 at 10:59
  • See https://stackoverflow.com/help/minimal-reproducible-example for how to create a minimal reproducible example. – Erwin Kalvelagen Jul 17 '22 at 05:11

0 Answers0