0

I have a simple question concerning VLAD vector representation. How is it that an 8192-dimensional (k=64, 128-D SIFT) VLAD vector take '32KB of of memory' per image? I could not relate these two numbers.

Alastair_V
  • 45
  • 6

1 Answers1

1

As described in the VLFeat documentation, each element of the VLAD vector is given by

 v_k = sum_i q_ik (x_i - mu_i)

where x_i is a descriptor vector (here: a 128-dimensional SIFT vector), and u_k is the center of the kth cluster - i.e. also a 128-dimensional SIFT vector. q_ik denotes the strength of association between x_i and u_i, which is 0 or 1 if K-means clustering is used. Thus, each v_k is 128-dimensional.

The VLAD vector of an image I is then given by stacking all v_k:

Vlad vector

This vector has k elements, and each element is 128-dimensional. Thus, for k=64, we end up with 64 * 128 = 8192 numbers describing image I.

Finally, if we use floating point numbers for each element, each number requires 4 bytes of memory. We thus end up with a total memory usage of 64 * 128 * 4 = 32768 Bytes or 32KB for the VLAD vector of each image.

hbaderts
  • 14,136
  • 4
  • 41
  • 48