0

I am new to Reinforcement Learning and Deep Learning and I'm tasked with creating a DQN agent that accepts an input that is a 2D matrix of 0s and 1s, and from that input, the agent must choose one of a big number of actions, each action is a 2D matrix of 0s and 1s in itself.

So for example, the input would always have 3 columns, and say, 10 rows, like this:

[
 [0, 0, 0], 
 [0, 1, 0], 
 [1, 0, 0], 
 [0, 0, 1]
 ...
]

Given that input, the agent must choose one of many 2D matrices of 0s and 1s, each matrix has, say, 12 rows and 10 columns (column number is equal to the number of rows in the input matrix).

I've tried some code examples like in this post.

However, the agent must choose best action among matrices of actions, hence the chosen action is the best matrix there is, and not just a single value.

I have no idea how to approach this.

Ness
  • 158
  • 1
  • 12
  • Is it right that you have a fixed, finite number of actions (matrices) and the network needs to pick the best one? i.e. the network should *not* change the values in any of the "action"/output matrices? – bogovicj Nov 17 '20 at 18:33
  • Correct. The number of matrices is very big though, and I don't know if there is a better way to approach the problem at hand. – Ness Nov 17 '20 at 18:36
  • Got it. In that case, you have a classification problem and you can have your network output a vector instead of a matrix that one-hot encodes which output/action/matrix to do. It's probably worth you trying that. Since you said the number of matrices is very big - is it larger than the number of matrix elements? – bogovicj Nov 17 '20 at 18:44
  • Yes, it is much larger. It has to be a matrix not a vector, unless you mean something else and I didn't really understand it. Is it possible to provide me with an example code please ? – Ness Nov 17 '20 at 18:46

0 Answers0