Build a matrix of available actions for Q-Learning

Question

I am simulating an inventory management system for a retail shop; therefore, I have a (15,15) matrix of zeros in which states are rows and actions columns:

Q = np.matrix(np.zeros([15, 15]) )

Specifically, 0 is the minimum and 14 the maximum inventory level, states are current inventory level and actions stock orders (quantity).

Consequently, I would like to substitute zeros with "-1", where the sum of state and action > 14:

print(final_Q)

#First row, from which I can order everything (since 0 + 14 == 14)
[[0 0   0   0   0   0   0   0   0   0   0   0   0   0   0]
#Second row, from which I can order max. 13 products (1 + 14 > 14)
[[0 0   0   0   0   0   0   0   0   0   0   0   0   0   -1]]
#Third row, from which the max is 12    
[[0 0   0   0   0   0   0   0   0   0   0   0   0   -1  -1]]

(...)

I tried implementing that manually, but how can I get the final matrix automatically?

Ecaxtly, but it should start from the end, as explained in the matrix of the post. @Attack68 — Alessandro Ceccarelli, Mar 19 '19 at 17:12

score 1 · Answer 1 · answered Mar 19 '19 at 18:43

# Q matrix
Q = np.matrix(np.zeros([15+1, 15+1] ))

# Create a diagonal of -1s
Q = Q[0:15][0:15]
il1 = np.tril_indices(15)
Q[il1] = -1
Q = np.rot90(Q)

# Adjust single values
Q[parameters["max_products"]-1][0, 1:] = Q[parameters["max_products"]][0, 1:]
Q = Q[:15, :]

It is definitely not computationally effective but it works.

score 1 · Accepted Answer · answered Mar 19 '19 at 18:57

Q = np.tril(-1*np.ones(15), -1)[:, ::-1]

>>> Q
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., 0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., 0., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0.,  0., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0.,  0., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.],
       [ 0., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1., -1.,-1., -1.]])

From numpy docs regarding `matrix`: "It is no longer recommended to use this class, even for linear algebra. Instead use regular arrays. The class may be removed in the future." — Attack68, Mar 21 '19 at 17:07

Build a matrix of available actions for Q-Learning

2 Answers2