How to reduce a fully-connected (`"InnerProduct"`) layer using truncated SVD

Question

In the paper Girshick, R Fast-RCNN (ICCV 2015), section "3.1 Truncated SVD for faster detection", the author proposes to use SVD trick to reduce the size and computation time of a fully connected layer.

Given a trained model (deploy.prototxt and weights.caffemodel), how can I use this trick to replace a fully connected layer with a truncated one?

Shai · Accepted Answer · 2019-07-09T05:17:38.920

Some linear-algebra background
Singular Value Decomposition (SVD) is a decomposition of any matrix W into three matrices:

W = U S V*

Where U and V are ortho-normal matrices, and S is diagonal with elements in decreasing magnitude on the diagonal. One of the interesting properties of SVD is that it allows to easily approximate W with a lower rank matrix: Suppose you truncate S to have only its k leading elements (instead of all elements on the diagonal) then

W_app = U S_trunc V*

is a rank k approximation of W.

Using SVD to approximate a fully connected layer
Suppose we have a model deploy_full.prototxt with a fully connected layer

# ... some layers here
layer {
  name: "fc_orig"
  type: "InnerProduct"
  bottom: "in"
  top: "out"
  inner_product_param {
    num_output: 1000
    # more params...
  }
  # some more...
}
# more layers...

Furthermore, we have trained_weights_full.caffemodel - trained parameters for deploy_full.prototxt model.

Copy deploy_full.protoxt to deploy_svd.protoxt and open it in editor of your choice. Replace the fully connected layer with these two layers:

layer {
  name: "fc_svd_U"
  type: "InnerProduct"
  bottom: "in" # same input
  top: "svd_interim"
  inner_product_param {
    num_output: 20  # approximate with k = 20 rank matrix
    bias_term: false
    # more params...
  }
  # some more...
}
# NO activation layer here!
layer {
  name: "fc_svd_V"
  type: "InnerProduct"
  bottom: "svd_interim"
  top: "out"   # same output
  inner_product_param {
    num_output: 1000  # original number of outputs
    # more params...
  }
  # some more...
}

In python, a little net surgery:

import caffe
import numpy as np

orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
# get the original weight matrix
W = np.array( orig_net.params['fc_orig'][0].data )
# SVD decomposition
k = 20 # same as num_ouput of fc_svd_U
U, s, V = np.linalg.svd(W)
S = np.zeros((U.shape[0], k), dtype='f4')
S[:k,:k] = s[:k]  # taking only leading k singular values
# assign weight to svd net
svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S)
svd_net.params['fc_svd_V'][0].data[...] = V[:k,:]
svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias
# save the new weights
svd_net.save('trained_weights_svd.caffemodel')

Now we have deploy_svd.prototxt with trained_weights_svd.caffemodel that approximate the original net with far less multiplications, and weights.

This is a nice solution, and technically correct but it's important to note when you should and should not leave the two linear layers seperate vs multiplying them together and making them one. if K isn't much smaller than either matrix dimension, then it may be fewer multiplies altogether to recombine them. (remember for matrix multiplication x(AB)=(xA)B. x(AB) might be fewer multiplies than (xA)B, if AB is precomputed, and if k isn't small enough) — Joseph Summerhays, Jun 13 '23 at 22:28

rkellerm · Answer 2 · 2017-08-31T11:47:20.297

4

Actually, Ross Girshick's py-faster-rcnn repo includes an implementation for the SVD step: compress_net.py.

BTW, you usually need to fine-tune the compressed model to recover the accuracy (or to compress in a more sophisticated way, see for example "Accelerating Very Deep Convolutional Networks for Classification and Detection", Zhang et al).

Also, for me scipy.linalg.svd worked faster than numpy's svd.

edited Aug 31 '17 at 11:47

answered Aug 24 '17 at 09:13

rkellerm

5,362
8
58
95

How to reduce a fully-connected (`"InnerProduct"`) layer using truncated SVD

2 Answers2

Linked