I am working on SqueezeNet pruning . I have some questions regarding the pruning code which is based on the paper : PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE
def compute_rank(self, grad):
activation_index = len(self.activations) - self.grad_index - 1
activation = self.activations[activation_index]
values = \
torch.sum((activation * grad), dim = 0, keepdim=True).\
sum(dim=2, keepdim=True).sum(dim=3, keepdim=True)[0, :, 0, 0].data
values = \
values / (activation.size(0) * activation.size(2) * activation.size(3))
if activation_index not in self.filter_ranks:
self.filter_ranks[activation_index] = \
torch.FloatTensor(activation.size(1)).zero_().cuda()
self.filter_ranks[activation_index] += values
self.grad_index += 1
1) Why 'values' uses only in_height (2) and in_width (3) of the activation ? What about in_channels (1) ?
2) Why does filter_ranks[activation_index] depend on in_channels (1) only ?
3) Why activation multiplied with gradient ? and why sum them up ?