I am new to Reinforcement Learning and Deep Learning and I'm tasked with creating a DQN agent that accepts an input that is a 2D matrix of 0s and 1s, and from that input, the agent must choose one of a big number of actions, each action is a 2D matrix of 0s and 1s in itself.
So for example, the input would always have 3 columns, and say, 10 rows, like this:
[
[0, 0, 0],
[0, 1, 0],
[1, 0, 0],
[0, 0, 1]
...
]
Given that input, the agent must choose one of many 2D matrices of 0s and 1s, each matrix has, say, 12 rows and 10 columns (column number is equal to the number of rows in the input matrix).
I've tried some code examples like in this post.
However, the agent must choose best action among matrices of actions, hence the chosen action is the best matrix there is, and not just a single value.
I have no idea how to approach this.