I have a QP problem in my reinforcement learning system, which calculates the safe action with the minimium distance between original action. This is the background.
Here is part of my code :
P = np.matrix([[1.,0.,0.,0.],[0.,1.,0.,0.],[0.,0.,1.,0.],[0.,0.,0.,1.]])
q = np.array(-u0).flatten() #
G = gx
h = np.array(c - fx).flatten()
lower_bound = np.array([0., -140., -200 * self.Pgs_M, 0.])
upper_bound = np.array([143., 140., 200 * self.Pgs_M, 0.])
safe_action = solve_qp(P,q,G,h,lb=lower_bound, ub=upper_bound)
The u0 is a matrix with the shape of [4,1], and my formulation is correct, G and h can be calculated through u0, self.Pgs_M is a constant number. Meanswhile, I considered the condition that solve_qp returns None.
When my reinforcement learning training working, it might get stuck. I print the control input and initial state before my programm get stuck. It shows that:
===============debug================
u0 = [[ 38.28203142]
[-140. ]
[-144.34985435]
[ 0. ]]
ini_spd = [[ 0. ]
[ 0. ]
[-203.67992371]
[ 203.67992371]
[ 0. ]
[ -0. ]]
So I use these input to check my QP problem solving programm, it actually worked, and returned None, because this problem cannot be solved.
===============debug================
u0 = [[ 38.28203142]
[-140. ]
[-144.34985435]
[ 0. ]]
ini_spd = [[ 0.00000000e+00]
[ 0.00000000e+00]
[-2.03679924e+10]
[ 2.03679924e+10]
[ 0.00000000e+00]
[-0.00000000e+00]]
safe_aciton = None
I was wondering why my programm might get stuck, even if I have already considered the None return. What factor will influence the qpslovers and python programm.