0

I am working on SqueezeNet pruning . I have some questions regarding the pruning code which is based on the paper : PRUNING CONVOLUTIONAL NEURAL NETWORKS FOR RESOURCE EFFICIENT INFERENCE

def compute_rank(self, grad):
    activation_index = len(self.activations) - self.grad_index - 1
    activation = self.activations[activation_index]
    values = \
        torch.sum((activation * grad), dim = 0, keepdim=True).\
            sum(dim=2, keepdim=True).sum(dim=3, keepdim=True)[0, :, 0, 0].data

    values = \
        values / (activation.size(0) * activation.size(2) * activation.size(3))

    if activation_index not in self.filter_ranks:
        self.filter_ranks[activation_index] = \
            torch.FloatTensor(activation.size(1)).zero_().cuda()

    self.filter_ranks[activation_index] += values
    self.grad_index += 1

1) Why 'values' uses only in_height (2) and in_width (3) of the activation ? What about in_channels (1) ?

2) Why does filter_ranks[activation_index] depend on in_channels (1) only ?

3) Why activation multiplied with gradient ? and why sum them up ?

kevin998x
  • 899
  • 2
  • 10
  • 23

1 Answers1

0

Large activation indicates that this filter provides important features.

Large grad shows that this filter is sensitive to different types of input

Filters with large activation and large grad are important and not removed

Sum is because only the entire filter can be removed

This is an educated guess for question 3)

Please correct me if wrong.

kevin998x
  • 899
  • 2
  • 10
  • 23