I am a beginner at Reinforcement Learning and Deep Learning, so bare with me ^^
Let's say we have a DQN agent in Keras that receives an input that is a 2D matrix of 0s and 1s, let's say it has 10 rows and 3 columns.
This matrix is a matrix of requests of 10 users (number of rows), if one of the columns' value is equal to 1, that means the user is asking the agent for a resource to be given to that user.
Example:
[
[0, 1, 0],
[0, 0, 0],
[1, 0, 0],
[0, 0, 1],
...
]
Upon receiving the input matrix, the agent must give a resource for the users that asked for it, and nothing for the users who didn't.
Let's say the agent has 12 resources that it can allocate. We can present the resource allocation as a 2D matrix that has 12 rows (number of resources) and 10 columns (number of users).
Each resource can be given to one user only and each user can use one resource only in each step.
I have tried this which is a similar problem to mine, but when I run the code, the q_values (or weights ?) are assigned to each column of each row of the output matrix, and what I want is the q_values to be assigned to the matrix as a whole, or at least that's what my beginner brain told me to do.
The action (output) matrix can be like this:
[
[1, 0, 0, 0, 0, ...]
[0, 0, 0, 0, 0, ...],
[0, 0, 0, 1, 0, ...],
...
]
One idea I had is to choose from a collection of matrices (actions), but the collection is very large and I cannot store it because it gives me a MemoryError.
I am still confused as to what the best approach to solve this dilemma is.