8

In the paper Girshick, R Fast-RCNN (ICCV 2015), section "3.1 Truncated SVD for faster detection", the author proposes to use SVD trick to reduce the size and computation time of a fully connected layer.

Given a trained model (deploy.prototxt and weights.caffemodel), how can I use this trick to replace a fully connected layer with a truncated one?

Shai
  • 111,146
  • 38
  • 238
  • 371

2 Answers2

9

Some linear-algebra background
Singular Value Decomposition (SVD) is a decomposition of any matrix W into three matrices:

W = U S V*

Where U and V are ortho-normal matrices, and S is diagonal with elements in decreasing magnitude on the diagonal. One of the interesting properties of SVD is that it allows to easily approximate W with a lower rank matrix: Suppose you truncate S to have only its k leading elements (instead of all elements on the diagonal) then

W_app = U S_trunc V*

is a rank k approximation of W.

Using SVD to approximate a fully connected layer
Suppose we have a model deploy_full.prototxt with a fully connected layer

# ... some layers here
layer {
  name: "fc_orig"
  type: "InnerProduct"
  bottom: "in"
  top: "out"
  inner_product_param {
    num_output: 1000
    # more params...
  }
  # some more...
}
# more layers...

Furthermore, we have trained_weights_full.caffemodel - trained parameters for deploy_full.prototxt model.

  1. Copy deploy_full.protoxt to deploy_svd.protoxt and open it in editor of your choice. Replace the fully connected layer with these two layers:

    layer {
      name: "fc_svd_U"
      type: "InnerProduct"
      bottom: "in" # same input
      top: "svd_interim"
      inner_product_param {
        num_output: 20  # approximate with k = 20 rank matrix
        bias_term: false
        # more params...
      }
      # some more...
    }
    # NO activation layer here!
    layer {
      name: "fc_svd_V"
      type: "InnerProduct"
      bottom: "svd_interim"
      top: "out"   # same output
      inner_product_param {
        num_output: 1000  # original number of outputs
        # more params...
      }
      # some more...
    }
    
  2. In python, a little net surgery:

    import caffe
    import numpy as np
    
    orig_net = caffe.Net('deploy_full.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
    svd_net = caffe.Net('deploy_svd.prototxt', 'trained_weights_full.caffemodel', caffe.TEST)
    # get the original weight matrix
    W = np.array( orig_net.params['fc_orig'][0].data )
    # SVD decomposition
    k = 20 # same as num_ouput of fc_svd_U
    U, s, V = np.linalg.svd(W)
    S = np.zeros((U.shape[0], k), dtype='f4')
    S[:k,:k] = s[:k]  # taking only leading k singular values
    # assign weight to svd net
    svd_net.params['fc_svd_U'][0].data[...] = np.dot(U,S)
    svd_net.params['fc_svd_V'][0].data[...] = V[:k,:]
    svd_net.params['fc_svd_V'][1].data[...] = orig_net.params['fc_orig'][1].data # same bias
    # save the new weights
    svd_net.save('trained_weights_svd.caffemodel')
    

Now we have deploy_svd.prototxt with trained_weights_svd.caffemodel that approximate the original net with far less multiplications, and weights.

Shai
  • 111,146
  • 38
  • 238
  • 371
  • 1
    amazing solution :) – MD. Nazmul Kibria Nov 08 '16 at 13:07
  • 1
    @Dale not my solution - it's Ross Girshick's. – Shai Jan 23 '17 at 09:34
  • 1
    I think you meant to write `W_app = U S_trunc V*`. – Autonomous Jul 08 '19 at 06:13
  • 1
    This is a nice solution, and technically correct but it's important to note when you should and should not leave the two linear layers seperate vs multiplying them together and making them one. if K isn't much smaller than either matrix dimension, then it may be fewer multiplies altogether to recombine them. (remember for matrix multiplication x(AB)=(xA)B. x(AB) might be fewer multiplies than (xA)B, if AB is precomputed, and if k isn't small enough) – Joseph Summerhays Jun 13 '23 at 22:28
4

Actually, Ross Girshick's py-faster-rcnn repo includes an implementation for the SVD step: compress_net.py.

BTW, you usually need to fine-tune the compressed model to recover the accuracy (or to compress in a more sophisticated way, see for example "Accelerating Very Deep Convolutional Networks for Classification and Detection", Zhang et al).

Also, for me scipy.linalg.svd worked faster than numpy's svd.

rkellerm
  • 5,362
  • 8
  • 58
  • 95