When using deep q-learning I am trying to capture motion by passing a number of grayscale frames as the input, each with the dimensions 90x90. There will be four 90x90 frames passed in to allow the network to detect motion. The multiple frames should be considered a single state rather than a batch of 4 states, how can I get a vector of actions as a result instead of a matrix?
I am using pytorch and it will return a matrix of 4x7 - a row of actions for each frame. here is the network:
self.conv1 = Conv2d(self.channels, 32, 8)
self.conv2 = Conv2d(32, 64, 4)
self.conv3 = Conv2d(64, 128, 3)
self.fc1 = Linear(128 * 52 * 52, 64)
self.fc2 = Linear(64, 32)
self.output = Linear(32, action_space)