A 8192-dimensional VLAD vector take 32KB of of memory per image. How?

Question

I have a simple question concerning VLAD vector representation. How is it that an 8192-dimensional (k=64, 128-D SIFT) VLAD vector take '32KB of of memory' per image? I could not relate these two numbers.

score 1 · Accepted Answer · answered Oct 05 '17 at 09:18

As described in the VLFeat documentation, each element of the VLAD vector is given by

where x_i is a descriptor vector (here: a 128-dimensional SIFT vector), and u_k is the center of the kth cluster - i.e. also a 128-dimensional SIFT vector. q_ik denotes the strength of association between x_i and u_i, which is 0 or 1 if K-means clustering is used. Thus, each v_k is 128-dimensional.

The VLAD vector of an image I is then given by stacking all v_k:

This vector has k elements, and each element is 128-dimensional. Thus, for k=64, we end up with 64 * 128 = 8192 numbers describing image I.

Finally, if we use floating point numbers for each element, each number requires 4 bytes of memory. We thus end up with a total memory usage of 64 * 128 * 4 = 32768 Bytes or 32KB for the VLAD vector of each image.

A 8192-dimensional VLAD vector take 32KB of of memory per image. How?

1 Answers1