1

When using deep q-learning I am trying to capture motion by passing a number of grayscale frames as the input, each with the dimensions 90x90. There will be four 90x90 frames passed in to allow the network to detect motion. The multiple frames should be considered a single state rather than a batch of 4 states, how can I get a vector of actions as a result instead of a matrix?

I am using pytorch and it will return a matrix of 4x7 - a row of actions for each frame. here is the network:

        self.conv1 = Conv2d(self.channels, 32, 8)
        self.conv2 = Conv2d(32, 64, 4)
        self.conv3 = Conv2d(64, 128, 3)
        self.fc1 = Linear(128 * 52 * 52, 64)
        self.fc2 = Linear(64, 32)
        self.output = Linear(32, action_space)
Ryan McCauley
  • 207
  • 1
  • 8
  • 1
    Easy but not elegant would be to just concatenate four frames in the first (the channel) dimension. More complex would be 3d convolutions (computationally expensive) or recurrent neural networks over the temporal dimension (just naming some examples here...) – Jan Jun 21 '20 at 15:06

1 Answers1

0

Select the action with the highest value. Let's call the output tensor be called action_values.

action=torch.argmax(action_values.data)

or

action=np.argmax(action_values.cpu().data.numpy())
David Buck
  • 3,752
  • 35
  • 31
  • 35
risper
  • 58
  • 7
  • As it is a matrix argmax will return a vector of actions, I would have to use argmax twice to get a single value and I'm not sure doing that makes sense – Ryan McCauley Jun 21 '20 at 19:14
  • Hello @RyanMcCauley, I think there are 4 actions that are continuous ... each action is represented as a vector. – risper Jul 23 '20 at 17:03