I'm trying to adapt the Q learning example from https://github.com/lanking520/RL-FlappyBird to play a different game, Pathery.
When calculating Q, I get an error about shape mismatch.
(QAgent.java L95)
NDList QReward = trainer.forward(preInput);
NDList targetQReward = trainer.forward(postInput);
NDList Q = new NDList(QReward.singletonOrThrow()
.mul(actionInput.singletonOrThrow())
.sum(new int[]{1}));
specifically at
.mul(actionInput.singletonOrThrow())
MXNetError: Check failed: l == 1 || r == 1: operands could not be broadcast together with shapes [32,2] [32,153]
The original code had an action space of size 2, while now the action space has size 153 and this does not seem to work that way.
How can I calculate Q here with an action space larger than 2?