3

I have trained a fastercnn model to detect human faces in an image using caffe. My current model size is 530MB. I wanted to reduce the size of my model, so I came accross Deep Compression By Song Han.

I've updated the less significant weights with 0 in my model using Pycaffe. The model size isn't reduced now, how to remove those insignificant connections from the trained caffe model, so that the size of the model is reduced?

desertnaut
  • 57,590
  • 26
  • 140
  • 166
Avis
  • 988
  • 2
  • 11
  • 31

2 Answers2

4

Since Blob data type in caffe (the basic "container" of numerical arrays) does not support "sparse" representation, replacing weights with zeros does not change the storage complexity: caffe still needs space to store these zeros. This is why you do not see a reduction in model size.

In order to prune connections you have to ensure the zeros follow a certain pattern: For example, an entire row of an "InnerProduct" is zero - you can eliminate one dimension of the previous layer, etc.

These modification can be made carefully manually using net surgery. Read more about it here (this example is actually on adding connections, but you can apply the same steps to prune connections).

You might find the SVD "trick" useful for reducing model complexity.

Community
  • 1
  • 1
Shai
  • 111,146
  • 38
  • 238
  • 371
  • Is there any way by which I can change the dtype of the weights from 'float32' to 'float16' so that the model size will be reduced? – Avis Nov 04 '16 at 09:22
3

@Shai 's answer explains well why your model size wasn't reduced.

As a supplement, to make the weights more sparse to obtain model compression in size, you can try the caffe for Structurally Sparse Deep Neural Networks.

Its main idea is to add in the loss function some regularizers, which in fact are L2-norms of weights grouped by row, column or channel etc(assume the weights from a layer has a shape (num_out, channel, row, column)). During training, these regularizers can make weights within the same group decay uniformly and thus the weights become more sparse and it's more easy to eliminate the weights in a whole row or column or even a whole channel.

Community
  • 1
  • 1
Dale
  • 1,608
  • 1
  • 9
  • 26
  • are you sure it's `L2` norm of groups? usually `L1` norm encourages sparseness... Is it `L2` norm of `L1` norm of columns (aka "mixed norm")? – Shai Nov 03 '16 at 12:30
  • 2
    @Shai It is a `L2` norm, namely a `Group Lasso` regulirizer in that paper(https://arxiv.org/pdf/1608.03665v4.pdf). It is used in combination to the `L1` norm(weight decay) in loss. – Dale Nov 03 '16 at 12:40
  • 1
    thanks! indeed you can get very interesting effects with the proper use of regularization. Thank you for the interesting reference. – Shai Nov 03 '16 at 12:43
  • 1
    @Shai Agree with you. : ) – Dale Nov 03 '16 at 12:46